Video Multi-Person Human Parsing

1. Metrics

For video instance-level human parsing, we use three metrics for multi-human parsing evaluation.

a. Mean IoU(%) for semantic part segmentation, reported by the FCN paper.

b. Follow Mask-RCNN paper, we used the mean value of several mean Average Precision(mAP) with IOU thresholds from 0.5 to 0.95 for evaluation of human instance segmentation, referred as APr.

c. APrvol for instance-level human parsing, reported by Holistic, Instance-Level Human Parsing

2. Submit format

A parent folder named mp_results.zip(Click to download a template file) contains 50 sub-folders in it. Each sub-folder represents a video result of test set video. Each video folder contains 3 sub-folders in it:

  1. Global_parsing
  2. Instance_segmentation
  3. Named as "instance_segmentation", this folder consists of two things:
    1. The content of id.png is the instance segmentation index image with exactly the same size. Each human instance belongs a unique human index id. 0 is always assumed to be the background label.
    2. A text file id.txt. Each line is of the format . The first line of this file corresponds to human instance index 1 in instance segmentation indexed image. The second line corresponds to 2 in indexed png and so on.
  4. Instance_parsing
  5. Named as "instance_parsing", this folder consists of two things:
    1. An indexed-png image with the segmentation. Here, each number belongs to a unique part. 0 is always assumed to be the background label.
    2. A text file. Each line is of the format < class_id score >. The first line of this file corresponds to 1 in the indexed png, the second line corresponds to 2 in the indexed png and so on.

3. Class Definition

  1. Background
  2. Hat
  3. Hair
  4. Glove
  5. Sunglasses
  6. Upper-clothes
  7. Dress
  8. Coat
  9. Socks
  10. Pants
  11. tosor-skin
  12. Scarf
  13. Skirt
  14. Face
  15. Left-arm
  16. Right-arm
  17. Left-leg
  18. Right-leg
  19. Left-shoe
  20. Right-shoe

4. Dataset Examples