/README.md
https://github.com/Caoimhyn/YOLO-R-MxNet · Markdown · 46 lines · 26 code · 20 blank · 0 comment · 0 complexity · f8a173d486a0a17357237051825e8a75 MD5 · raw file
- # YOLO v1 with R language ( MxNet library )
- (Version 0.1, Last updated :2018.07.02)
- #### [MxNet](https://mxnet.apache.org/):A flexible and efficient library for deep learning.
- ## 1. Introduction
- This is mxnet implementation of the YOLO:Real-Time Object Detection.
- YOLO is an unified framework for object detection with a single network.
- It has been originally introduced in this research [article](https://pjreddie.com/media/files/papers/yolo.pdf).
- This repository contains a MxNet implementation of a MobileNets_V2-based YOLO networks.
- For details with Google's MobileNets, please read the following papers:
- - [v1] [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
- - [v2] [Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation](https://arxiv.org/abs/1801.04381)
- ## 2. Pretrained Models on ImageNet
- See: https://github.com/yuantangliang/MobileNet-v2-Mxnet
- The top-1/5 accuracy rates by using single center crop (crop size: 224x224, image size: 256xN):
- Network|Top-1|Top-5|sha256sum|Architecture
- :---:|:---:|:---:|:---:|:---:
- MobileNet v2| 71.90| 90.49| a3124ce7 (13.5 MB)| [netscope](http://ethereon.github.io/netscope/#/gist/d01b5b8783b4582a42fe07bd46243986)
- ## 3. Pikachu data
- For testing model purposes, we’ll train our model to detect Pikachu in the wild. We use a synthetic toy dataset by rendering images from open-sourced 3D Pikachu models.
- For more detail. Please see:
- - https://gluon.mxnet.io/chapter08_computer-vision/object-detection.html.
- - http://zh.gluon.ai/chapter_computer-vision/pikachu.html.
- <p align="center">
- <img src="https://user-images.githubusercontent.com/3307514/29479494-5dc28a02-8427-11e7-91d0-2849b88c17cd.png">
- </p>
- The dataset consists of 1088 pikachus with random pose/scale/position in random background images. The exact locations are recorded as ground-truth for training and validation.