2025
                        
                        
                        
                        
                        
Cross-Modal Causal Representation Learning for Radiology Report Generation                            
                            
                            
                            2024
                        
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training                            
                            
                            
                            
Universal semi-supervised model adaptation via collaborative consistency training                            
                            
                            
                            
Open-vocabulary segmentation with semantic-assisted calibration                            
                            
                            
                            
Alignsam: Aligning segment anything model to open context via reinforcement learning                            
                            
                            
                            
Inter-domain mixup for semi-supervised domain adaptation                            
                            
                            
                            
FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels                            
                            
                            
                            
Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal                            
                            
                            
                            
Variance-Insensitive and Target-Preserving Mask Refinement for Interactive Image Segmentation                            
                            
                            
                            
Structure embedded nucleus classification for histopathology images                            
                            
                            
                            2023
                        
Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation                            
                            
                            
                            
Unpaired Image-to-Image Translation based Domain Adaptation for Polyp Segmentation                            
                            
                            
                            
Adapting Object Size Variance and Class Imbalance for Semi-Supervised Object Detection                            
                            
                            
                            
De-biased Teacher: Rethinking IoU Matching for Semi-Supervised Object Detection                            
                            
                            
                            
Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training                            
                            
                            
                            
                            
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment                            
                            
                            
                            
GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning                            
                            
                            
                            
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors                            
                            
                            
                            
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving                            
                            
                            
                            
                        
CLIP²: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data                            
                            
                            
                            
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment                            
                            
                            
                            
Improved Distribution Matching for Dataset Condensation                            
                            
                            
                            
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers                            
                            
                            
                            
Divide and Adapt: Active Domain Adaptation via Customized Learning                            
                            
                            
                            
                            
Learning to Segment Every Referring Object Point by Point                            
                            
                            
                            
Dynamic Graph Enhanced Contrastive Learning for Medical Report Generation                            
                            
                            
                            
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method                            
                            
                            
                            
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining                            
                            
                            
                            
                        
DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback                            
                            
                            
                            
                        
IRA-FSOD: Instant-Response and Accurate Few-shot Object Detector                            
                            
                            
                            
                        
Taylor Neural Network for Real-World Image Super-Resolution                            
                            
                            
                            
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification                            
                            
                            
                            
                        
Multi-Person 3D Pose Esitmation with Occlusion Reasoning                            
                            
                            
                            
Discourse-Aware Graph Networks for Textual Logical Reasoning                            
                            
                            
                            
                        
NLIP: Noise-robust Language-Image Pre-training                            
                            
                            
                            
                        
Prototypical Graph Contrastive Learning                            
                            
                            
                            
Template-Based Contrastive Distillation Pretraining for Math Word Problem Solving                            
                            
                            
                            
                        
Fine-grained Face Editing via Personalized Spatial-aware Affine Modulation                            
                            
                            
                            
Scene Graph to Image Synthesis via Knowledge Consensus                            
                            
                            
                            
                        
Urban Regional Function Guided Traffic Flow Prediction                            
                            
                            
                            
                        
Causality-aware Visual Scene Discovery for Cross-Modal Question Reasoning                            
                            
                            
                            
                        2022
                        
Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels                            
                            
                            
                            
                        
A Causal Debiasing Framework for Unsupervised Salient Object Detection                            
                            
                            
                            
                        
A Causal Inference Look At Unsupervised Video Anomaly Detection                            
                            
                            
                            
                        
Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning                            
                            
                            
                            
                        
Early Prediction of Blastocyst Development via Time-Lapse Video Analysis                            
                            
                            
                            
                            
Semantic-aware Temporal Channel-wise Attention for Cardiac Function Assessment                            
                            
                            
                            
                            
Cross-level contrastive learning and consistency constraint for semi-supervised medical image segmentation                            
                            
                            
                            
View-Disentangled Transformer for Brain Lesion Detection                            
                            
                            
                            
X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning                            
                            
                            
                            
                        
BoxPolyp: Boost Generalized Polyp Segmentation using Extra Coarse Bounding Box Annotations                            
                            
                            
                            
Semi-Supervised Spatial Temporal Attention Network for Video Polyp Segmentation                            
                            
                            
                            
Less is More: Adaptive Curriculum Learning for Thyroid Nodule Diagnosis                            
                            
                            
                            
Attentive Symmetric Autoencoder for Brain MRI Segmentation                            
                            
                            
                            
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training                            
                            
                            
                            
Lesion-aware Dynamic Kernel for Polyp Segmentation                            
                            
                            
                            
Multi-level Consistency Learning for Semi-supervised Domain Adaptation                            
                            
                            
                            
                        
Double-Check Soft Teacher for Semi-Supervised Object Detection                            
                            
                            
                            
Hybrid-Order Representation Learning for Electricity Theft Detection                            
                            
                            
                            
                        
Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation                            
                            
                            
                            
                        
Semantic-Aware Auto-Encoders for Self-supervised Representation Learning                            
                            
                            
                            
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism                            
                            
                            
                            
                        
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning                            
                            
                            
                            
Structure-Preserving 3D Garment Modeling with Neural Sewing Machines                            
                            
                            
                            
Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation                            
                            
                            
                            
Real-World Image Super-Resolution by Exclusionary Dual-Learning                            
                            
                            
                            
VQAMix: Conditional Triplet Mixup for Medical Visual Question Answering                            
                            
                            
                            
                        
Thyroid Region Prior Guided Attention for Ultrasound Segmentation of Thyroid Nodules                            
                            
                            
                            
                        
Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels                            
                            
                            
                            
                        
Neighborhood Collective Estimation for Noisy Label Identification and Correction                            
                            
                            
                            
                        
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge                            
                            
                            
                            
                        
Compound Batch Normalization for Long-tailed Image Classification                            
                            
                            
                            
                        
HairGAN: Spatial-Aware Palette GAN for Hair Color Transfer                            
                            
                            
                            
Multimodal Crowd Counting with Mutual Attention Transformers                            
                            
                            
                            
CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models                            
                            
                            
                            
                        
Cross-Domain Action Recognition via Prototypical Graph Alignment                            
                            
                            
                            
                            
                        
Causal Reasoning Meets Visual Representation Learning: A Prospective Study                            
                            
                            
                            
                        
Cross-modal knowledge distillation for vision-to-sensor action recognition                            
                            
                            
                            
                        2021
                        
                        
REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement                            
                            
                            
                            
                        
Representative Local Feature Mining for Few-Shot Learning                            
                            
                            
                            
                        
Rethinking the Pruning Criteria for Convolutional Neural Network                            
                            
                            
                            
                        
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning                            
                            
                            
                            
                        
Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learnin                            
                            
                            
                            
                        
Deductive Learning for Weakly-Supervised 3D Human Pose Estimation via Uncalibrated Cameras                            
                            
                            
                            
                        
Robust Real-World Image Super-Resolution against Adversarial Attacks                            
                            
                            
                            
                        
Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video                            
                            
                            
                            
                        
Learning Spatially Variant Linear Representation Models for Joint Filtering                            
                            
                            
                            
                        
Hierarchical Reasoning Network for Human-Object Interaction Detection                            
                            
                            
                            
                        
Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition                            
                            
                            
                            
                        
Cross-Modal Progressive Comprehension for Referring Segmentation                            
                            
                            
                            
                        
Contralaterally Enhanced Networks for Thoracic Disease Detection                            
                            
                            
                            
                        
Human-Centric Spatio-Temporal Video Grounding with Visual Transformers                            
                            
                            
                            
                        
Instance-Level Salient Object Segmentation                            
                            
                            
                            
                        
LapsCore: Language-guided Person Search via Color Reasoning                            
                            
                            
                            
                        
Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment                            
                            
                            
                            
                        
Multi-Layer Networks for Ensemble Precipitation Forecasts Postprocessing                            
                            
                            
                            
                        
Discovering Implicit Classes Achieves Open Set Domain Adaptation                            
                            
                            
                            
Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions                            
                            
                            
                            
                        2020
                        
                        
Learning Semi-supervised Multi-Label Fully Convolutional Network for Hierarchical Object Parsing                            
                            
                            
                            
                        
Unifying Temporal Context and Multi-feature with Update-Pacing Framework for Visual Tracking                            
                            
                            
                            
                        
Crowd Counting via Scale-Communicative Aggregation Networks                            
                            
                            
                            
                        
Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition                            
                            
                            
                            
                            
                        
Knowledge Graph Transfer Network for Few-Shot Recognition                            
                            
                            
                            
                            
                        
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems                            
                            
                            
                            
                        
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video                            
                            
                            
                            
                        
Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification with Deep Mis-Ranking                            
                            
                            
                            
                        
Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation                            
                            
                            
                            
                        
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos                            
                            
                            
                            
                        
Online Alternate Generator Against Adversarial Attacks                            
                            
                            
                            
                        
Multi-Granularity Tracking with Modularlized Components for Unsupervised Vehicles Anomaly Detection                            
                            
                            
                            
                            
                        
Configurable Graph Reasoning for Visual Relationship Detection                            
                            
                            
                            
                        
Active Object Search                            
                            
                            
                            
                            
                        
Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread                            
                            
                            
                            
                        
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation                            
                            
                            
                            
                        
Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition                            
                            
                            
                            
                        
Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View                            
                            
                            
                            
                        
Depthwise Nonlocal Module for Fast Salient Object Detection Using a Single Thread                            
                            
                            
                            
                        
Self-Enhanced Convolutional Network for Facial Video Hallucination                            
                            
                            
                            
                        
Deep Transformers For Fast Small Intestine Grounding In Capsule Endoscope Video                            
                            
                            
                            
                        
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation                            
                            
                            
                            
                        
Propagating Over Phrase Relations for One-Stage Visual Grounding                            
                            
                            
                            
                        
Peeking into Occluded Joints: A Novel Framework for Crowd Pose Estimation                            
                            
                            
                            
                        
Graph-Structured Referring Expression Reasoning in The Wild                            
                            
                            
                            
                            
                        
A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension                            
                            
                            
                            
                        2019
                        
Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks                            
                            
                            
                            
                            
                        
Context-Aware Semantic Inpainting                            
                            
                            
                            
                        
Cross-Modal Attentional Context Learning for RGB-D Object Detection                            
                            
                            
                            
                        
Visual Tracking via Dynamic Graph Learning                            
                            
                            
                            
                        
Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning                            
                            
                            
                            
                        
Progressively Diffused Networks for Semantic Visual Parsing                            
                            
                            
                            
                        
CamDrop: A New Explanation of Dropout and A Guided Regularization Method for Deep Neural Networks                             
                            
                            
                            
                        
Semi-supervised Skin Detection by Network with Mutual Guidance                            
                            
                            
                            
                        
ROSA: Robust Salient Object Detection against Adversarial Attacks                            
                            
                            
                            
                        
Neural Task Planning with And-Or Graph Representations                            
                            
                            
                            
                        
ClusterNet: Deep Hierarchical Cluster Network with Rigorously Rotation-Invariant Representation for Point Cloud Analysis                            
                            
                            
                            
                        
Spatially Variant Linear Representation Models for Joint Filtering                            
                            
                            
                            
                        
Layout-Graph Reasoning for Fashion Landmark Detection                            
                            
                            
                            
                        
Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation                            
                            
                            
                            
                        
Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation                            
                            
                            
                            
                        
Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction                            
                            
                            
                            
                        
Semi-Supervised Video Salient Object Detection Using Pseudo-Labels                            
                            
                            
                            
                        
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid                            
                            
                            
                            
                            
                        
Motion Guided Attention for Video Salient Object Detection                            
                            
                            
                            
                        
Concrete Image Captioning by Integrating Content Sensitive and Global Discriminative Objective                            
                            
                            
                            
                        
Lightweight adversarial network for salient object detection                            
                            
                            
                            
                        
Multi-Task Learning For Thyroid Nodule Segmentation With Thyroid Region Prior                            
                            
                            
                            
                        
Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation                            
                            
                            
                            
                        
Bottom-Up Shift and Reasoning for Referring Image Segmentation                            
                            
                            
                            
                        
Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection                            
                            
                            
                            
                        
Referring Image Segmentation via Cross-Modal Progressive Comprehension                            
                            
                            
                            
                        
MetaSelection: Metaheuristic Sub-Structure Selection for Neural Network Pruning Using Evolutionary Algorithm                            
                            
                            
                            
                        
Lightweight Contrast Modeling for Attention-Aware Visual Localization                            
                            
                            
                            
                        
NADPEx: An On-policy Temporally Consistent Exploration Method for Deep Reinforcement Learning                            
                            
                            
                            
                        
Dynamic Graph Attention for Referring Expression Comprehension                            
                            
                            
                            
                            
                        
Simultaneous Lung Field Detection and Segmentation for Pediatric Chest Radiographs                            
                            
                            
                            
                        
Globally Guided Progressive Fusion Network for 3D Pancreas Segmentation                            
                            
                            
                            
                        
Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition                            
                            
                            
                            
                            
                        2018
                        
Symbolic Graph Reasoning Meets Convolutions                            
                            
                            
                            
                        
Learning to Segment Object Candidates via Recursive Neural Networks                            
                            
                            
                            
                        
High-Precision Camera Localization in Scenes with Repetitive Patterns                            
                            
                            
                            
                        
Learning Support Correlation Filters for Visual Tracking                            
                            
                            
                            
                        
Proposal-free Network for Instance-level Semantic Object Segmentation                            
                            
                            
                            
                        
Crowd Counting using Deep Recurrent Spatial-Aware Network                            
                            
                            
                            
                        
DRPose3D: Depth Ranking in 3D Human Pose Estimation                            
                            
                            
                            
                        
Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition                            
                            
                            
                            
                        
Flow Guided Recurrent Neural Encoder for Video Salient Object Detections                            
                            
                            
                            
                        
Interpretable Video Captioning via Trajectory Structured Localization                            
                            
                            
                            
                        
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid                            
                            
                            
                            
                        
Automatic Color Sketch Generation using Deep Style Transfer                            
                            
                            
                            
                        
Scene-Intuitive Agent for Remote Embodied Visual Grounding                            
                            
                            
                            
                        
Linguistic Structure Guided Context Modeling for Referring Image Segmentation                            
                            
                            
                            
                        
Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains                            
                            
                            
                            
                        
Fusing Object Context to Detect Functional Area for Cognitive Robots                            
                            
                            
                            
                        
Avoidance of High-speed Obstacles Based on Velocity Obstacles                            
                            
                            
                            
                        
Weakly Supervised Salient Object Detection Using Image Labels                            
                            
                            
                            
                            
                        
Learning deep representations for semantic image parsing: a comprehensive overview                            
                            
                            
                            
                        
Attentive Crowd Flow Machines                            
                            
                            
                            
                        
Facial Landmark Localization in the Wild by Backbone-Branches Representation Learning                            
                            
                            
                            
                        
Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition                            
                            
                            
                            
                        
BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network                            
                            
                            
                            
                        
Monocular Depth Estimation with Affinity, Vertical Pooling, and Label Enhancement                            
                            
                            
                            
                        
Embedding Temporally Consistent Depth Recovery for Real-time Dense Mapping in Visual-inertial Odometry                            
                            
                            
                            
                        
Reinforcement Cutting-Agent Learning for Video Object Segmentation                            
                            
                            
                            
                            
                        
Dynamic-structured Semantic Propagation Network                            
                            
                            
                            
                        
StepDeep: A Novel Spatial-temporal Mobility Event Prediction Framework based on Deep Neural Network                            
                            
                            
                            
                        
Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio                            
                            
                            
                            
                        
Teaching Robots to Predict Human Motion                            
                            
                            
                            
                            
                        
A Modulation Module for Multi-task Learning with Applications in Image Retrieval                            
                            
                            
                            
                        
RCAA: Relational Context-Aware Agents for Person Search                            
                            
                            
                            
                        
Real-to-Virtual Domain Unification for End-to-End Autonomous Driving                            
                            
                            
                            
                        
Adversarial Geometry-Aware Human Motion Prediction                            
                            
                            
                            
                            
                        
CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving                            
                            
                            
                            
Generative Semantic Manipulation with Mask-Contrasting GAN                            
                            
                            
                            
                        
Reinforced Auto-Zoom Net: Towards Accurate and Fast Breast Cancer Segmentation in Whole-slide Images                            
                            
                            
                            
                            
                        
Deep Generative Models with Learnable Knowledge Constraints                            
                            
                            
                            
                        
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis                            
                            
                            
                            
                        
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation                            
                            
                            
                            
                        
Recurrent Attention Reinforcement Learning for Multi-label Image Recognition                            
                            
                            
                            
                        
Contrast-Oriented Deep Neural Networks for Salient Object Detection                            
                            
                            
                            
                        
Attentive LSTM Crowd Flow Machines                            
                            
                            
                            
                        2017
                        
Nonparametric Variational Auto-encoders for Hierarchical Representation Learning                            
                            
                            
                            
                        
Temporal Dynamic Graph LSTM for Action-driven Video Object Detection                            
                            
                            
                            
                        
Recurrent Topic-Transition GAN for Visual Paragraph Generation                            
                            
                            
                            
                        
Dual Motion GAN for Future-Flow Embedded Video Prediction                            
                            
                            
                            
                        
Attentive Contexts for Object Detection                            
                            
                            
                            
                        
Multi-stage Object Detection with Group Recursive Learning                            
                            
                            
                            
                        
Scale-aware Fast R-CNN for Pedestrian Detection                            
                            
                            
                            
                        
Saliency Detection on Light Field: A Multi-Cue Approach                            
                            
                            
                            
                        
Distance Metric Learning via Iterated Support Vector Machines                            
                            
                            
                            
                        
Human Parsing with Contextualized Convolutional Neural Network                            
                            
                            
                            
                        
WELD: Weighted Low-rank Decomposition for Robust Grayscale-Thermal Foreground Detection                            
                            
                            
                            
                        
Knowledge-Guided Recurrent Neural Network Learning for Task-oriented Action Prediction                            
                            
                            
                            
                            
                        
Attention-Aware Face Hallucination via Deep Reinforcement Learning                            
                            
                            
                            
                        
Recurrent 3D Pose Sequence Machines                            
                            
                            
                            
                            
                        
Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing                            
                            
                            
                            
                        
Instance-Level Salient Object Segmentation                            
                            
                            
                            
                        
Crowd Counting via Multi-View Scale Aggregation Networks                            
                            
                            
                            
                        
Learning Object Interactions and Descriptions for Semantic Image Segmentation                            
                            
                            
                            
                        
Learning Patch-based Dynamic Graph for Visual Tracking                            
                            
                            
                            
                            
                        
Automatic colorization with improved spatial coherence and boundary localization                            
                            
                            
                            
                        
Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning                            
                            
                            
                            
                        
Using 3D Face Priors for Depth Recovery                            
                            
                            
                            
                        
Reconstructing dynamic objects via LiDAR odometry oriented to depth fusion                            
                            
                            
                            
                            
                        
Image-to-Video Person Re-Identification with Temporally Memorized Similarity Learning                            
                            
                            
                            
                            
                        
Cluster synchronization for coupled systems with nonidentical linear dynamics                            
                            
                            
                            
                        
Cost-Effective Active Learning for Deep Image Classification                            
                            
                            
                            
                        
Face Recognition via Heuristic Deep Active Learning                            
                            
                            
                            
                        
Face Recognition by Coarse-to-Fine Landmark Regression with Application to ATM Surveillance                            
                            
                            
                            
                        
Face Attributes Recognition via Deep Multi-Task Cascade                            
                            
                            
                            
                        
Deep Dual Learning for Semantic Image Segmentation                            
                            
                            
                            
                        
Place-centric Visual Urban Perception with Deep Multi-instance Regression                            
                            
                            
                            
                        
Multi-label Image Recognition by Recurrently Discovering Attentional Regions                            
                            
                            
                            
                        
Perceptual Generative Adversarial Networks for Small Object Detection                            
                            
                            
                            
                        
Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach                            
                            
                            
                            
                            
                        
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection                            
                            
                            
                            
                            
                        
Deep learning based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms                            
                            
                            
                            
                        
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters                            
                            
                            
                            
                            
                        
Deep Attribute-preserving Metric Learning for Natural Language Object Retrieval                            
                            
                            
                            
                        
Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection                            
                            
                            
                            
                        2016
                        
Visual Saliency Detection Based on Multiscale Deep CNN Features                            
                            
                            
                            
                        
Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning                            
                            
                            
                            
                            
                        
LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling                            
                            
                            
                            
                        
Peak-Piloted Deep Network for Facial Expression Recognition                            
                            
                            
                            
                        
Semantic Object Parsing with Graph LSTM                            
                            
                            
                            
                            
                        
Tree-structured Reinforcement Learning for Sequential Object Localization                            
                            
                            
                            
                        
Learning to Segment with Image-level Annotations                            
                            
                            
                            
                        
Scale-aware Pixelwise Object Proposal Network                            
                            
                            
                            
                        
Learning to Segment Human by Watching YouTube                            
                            
                            
                            
                        
Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection                            
                            
                            
                            
                        
Inference With Collaborative Model for Interactive Tumor Segmentation in Medical Image Sequences                            
                            
                            
                            
                        
Clothes Co-Parsing via Joint Image Segmentation and Labeling with Application to Clothing Retrieval                            
                            
                            
                            
                        
Detection-free Multi-object Tracking by Reconfigurable Inference with Bundle Representations                            
                            
                            
                            
                        
Recognizing Focal Liver Lesions in CEUS with Dynamically Trained Latent Structured Models                            
                            
                            
                            
                        
Deep Boosting: Joint Feature Selection and Analysis Dictionary Learning in Hierarchy                            
                            
                            
                            
                        
Geometric Scene Parsing with Hierarchical LSTM                            
                            
                            
                            
                        
Deep Structured Scene Parsing by Learning with Image Descriptions                            
                            
                            
                            
                            
                        
Semantic Object Parsing with Local-Global Long Short-Term Memory                            
                            
                            
                            
                            
                        
Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection                            
                            
                            
                            
Joint Learning of Single-image and Cross-image Representations for Person Re-identification                            
                            
                            
                            
                        
Reversible Recursive Instance-level Object Segmentation                            
                            
                            
                            
                        
DARI: Distance metric And Representation Integration for Person Verification                            
                            
                            
                            
                        
Character Proposal Network for Robust Text Extraction                            
                            
                            
                            
                        
ColorSketch: A Drawing Assistant for Generating Color Sketches from Photos                            
                            
                            
                            
                        
Deep Contrast Learning for Salient Object Detection                            
                            
                            
                            
                        
STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation                            
                            
                            
                            
                        
Class relatedness oriented-discriminative dictionary learning for multiclass image classification                            
                            
                            
                            
                        
Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking                            
                            
                            
                            
                        
A Stochastic Image Grammar for Fine-grained 3D Scene Reconstruction                            
                            
                            
                            
                        
Local- and Holistic-structure Preserving Image Super Resolution via Deep Joint Component Learning                            
                            
                            
                            
                            
                        2015
                        
Human Parsing with Contextualized Convolutional Neural Network                            
                            
                            
                            
                            
                        
Towards Computational Baby Learning: A Weakly-supervised Approach for Object Detection                            
                            
                            
                            
                        
Transferred Human Parsing with Video Context                            
                            
                            
                            
PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Edge-Preserving Coherence                            
                            
                            
                            
                        
Deep Feature Learning with Relative Distance Comparison for Person Re-identification”                            
                            
                            
                            
                        
Kernel Sparse Representation for Time Series Classification                            
                            
                            
                            
                        
Hierarchical Ensemble of Background Models for PTZ-based Video Surveillance                            
                            
                            
                            
                        
Discriminative Learning of Iteration-Wise Priors for Blind Deconvolution                            
                            
                            
                            
                        
Matching-CNN Meets KNN: Quasi-Parametric Human Parsing                            
                            
                            
                            
                        
End-to-End Photo-Sketch Generation via Full Convolutional Representation Learning                            
                            
                            
                            
                        
Data-Driven Scene Understanding with Adaptively Retrieved Exemplars                            
                            
                            
                            
                        
A Deep Joint Learning Approach for Age Invariant Face Verification                            
                            
                            
                            
                        
Visual Saliency Based on Multiscale Deep Features                            
                            
                            
                            
                        
Multiple human tracking based on distributed collaborative cameras                            
                            
                            
                            
                        
Weighted Nuclear Norm Minimization Based Tongue Specular Reflection Removal                            
                            
                            
                            
Multi-Loss Regularized Deep Neural Network                            
                            
                            
                            
                        
Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification                            
                            
                            
                            
                        2014
                        
                        
3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks                            
                            
                            
                            
                            
                        
Complex Background Subtraction by Pursuing Dynamic Spatio-Temporal Models                            
                            
                            
                            
                        
Fashion Parsing with Video Context                            
                            
                            
                            
                            
                        
Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness                            
                            
                            
                            
                        
Robust Feature Point Matching with Sparse Model                            
                            
                            
                            
                        
Adaptive Scene Category Discovery with Generative Learning and Compositional Sampling                            
                            
                            
                            
                        
An Expressive Deep Model for Parsing Human Action from a Single Image                            
                            
                            
                            
                            
                        
Deep Boosting: Layered Feature Mining for General Image Classification                            
                            
                            
                            
                        
Clothing Co-Parsing by Joint Image Segmentation and Labeling                            
                            
                            
                            
                        
Recognizing Focal Liver Lesions in Contrast-Enhanced Ultrasound with Discriminatively Trained Spatio-Temporal Model                            
                            
                            
                            
                        
Towards a solid solution of real-time fire and flame detection                            
                            
                            
                            
                        
Salient object detection based on regions                            
                            
                            
                            
                        
Pulse Waveform Classification Using Support Vector Machine with Gaussian Time Warp Edit Distance Kernel                            
                            
                            
                            
                        2013
                        
Contextualized Trajectory Parsing with Spatio-Temporal Graph                            
                            
                            
                            
                        
Video Stylization: Painterly Rendering and Optimization with Content Extraction                            
                            
                            
                            
                        
Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm                            
                            
                            
                            
                        
Discovering Video Shot Categories by Unsupervised Stochastic Graph Partition                            
                            
                            
                            
                            
                        
Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection                            
                            
                            
                            
                        
PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors                            
                            
                            
                            
                        
Robust Region Grouping via Internal Patch Statistics                            
                            
                            
                            
                            
                        
Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition                            
                            
                            
                            
                        
Integrating multi-stage depth-induced contextual information for human action recognition and localization                            
                            
                            
                            
                        
Human Re-identification by Matching Compositional Template with Cluster Sampling                            
                            
                            
                            
                        
SYM-FISH: A Symmetry-aware Flip Invariant Sketch Histogram Shape Descriptor                            
                            
                            
                            
                        
Correntropy Induced L2 Graph for Robust Subspace Clustering                            
                            
                            
                            
                        2012
                        
Integrating Graph Partitioning and Matching for Trajectory Analysis in Video Surveillance                            
                            
                            
                            
                        
Object Categorization with Sketch Representation and Generalized Samples                            
                            
                            
                            
                        
Representing and Recognizing Objects with Massive Local Image Patches                            
                            
                            
                            
                        
Dynamical And-Or Graph Learning for Object Shape Modeling and Detection                            
                            
                            
                            
                        
Robust Stroke-based Video Animation via Layered Motion and Correspondence                            
                            
                            
                            
                        
Joint Semantic Segmentation by Searching for Compatible-Competitive References                            
                            
                            
                            
                        
Object-Layout-Aware Image Retrieval for Personal Album Management                            
                            
                            
                            
                            
                        
Learning Contour-Fragment-based Shape Model with And-Or Tree Representation                            
                            
                            
                            
                        
Cross-based Local Multipoint Filtering                            
                            
                            
                            
                        
Enhancing group awareness on the web: Prototype and experiments of sharing web page visitation information among teammates                            
                            
                            
                            2011
                        
Integrating Spatio-temporal Context with Multiview Representation for Object Recognition in Visual Surveillance                            
                            
                            
                            
                            
                        
Group Crumb: Sharing Web Navigation by Visualizing Group Traces on the Web                            
                            
                            
                            
                            
                        
Adaptive Object Tracking by Learning Hybrid Template On-line                            
                            
                            
                            
                        
High Resolution Face Fusion for Gender Conversion                            
                            
                            
                            
                        
Segment an Image by Looking into an Image Corpus                            
                            
                            
                            
                        
Interactive CT image segmentation with online discriminative learning                            
                            
                            
                            
                        
Color style transfer by constraint locally linear embedding                            
                            
                            
                            
                        2010
                        
Layered Graph Matching with Composite Cluster Sampling                            
                            
                            
                            
                        
I2T: Image Parsing to Text Description                            
                            
                            
                            
                        
Skeletonization with Particle Filters                            
                            
                            
                            
                        
Learning Shape Detector by Quantizing Curve Segments with Multiple Distance Metrics                            
                            
                            
                            
                        
Painterly Animation Using Video Semantics and Feature Correspondence                            
                            
                            
                            
                            
                        
Tracking Objects with Adaptive Feature Patches for PTZ Camera Visual Surveillance                            
                            
                            
                            
                        
Semantics-driven portrait cartoon stylization                            
                            
                            
                            
                            
                        
A Discriminative Model for Object Representation and Detection via Sparse Features                            
                            
                            
                            
                        
Classification of Pulse Waveforms Using Edit Distance with Real Penalty                            
                            
                            
                            
                        
Gaussian ERP Kernel Classifier for Pulse Waveforms Classification                            
                            
                            
                            
                        
Time Series Classification Using Support Vector Machine with Gaussian Elastic Metric Kernel                            
                            
                            
                            
                        
Pulse Waveform Classification Using ERP-Based Difference-Weighted KNN Classifier                            
                            
                            
                            
                        
Distinguishing Patients with Gastritis and Cholecystitis from the Healthy by Analyzing Wrist Radial Arterial Doppler Blood Flow Signals                            
                            
                            
                            
                        
Classification of Wrist Pulse Blood Flow Signal Using Time Warp Edit Distance                            
                            
                            
                            
                        
Multitasking Bar: Prototype and Evaluation of Introducing the Task Concept into a Browser                            
                            
                            
                            
                            
                         
                