Yande.Re 2020 samples and metadata

Category:

Art - Pictures

Date:

2021-01-03 13:38

Submitter:

AlexPUA

Seeders:

Information:

No information.

Leechers:

File size:

132.7 GiB

Completed:

151

Info hash:

75f0b8fd1a7449d8b005ec67e645b34539affe0a

Download Torrent or Magnet

Yande.Re is one of famous anime/game/CG imageboards, devoted mostly for high quality scan - artbooks etc.
It has strong community (that gives the adequate score to images) and well organized (not too verbose) tagging system.
Also Yande_Re presents wide variery pictures composition and quality - from trash-like partial scans and completely
text-filled artbook pages to clear art, from bad quality scans and almost-unvisible line art to pure full-color digital.
That’s why Yande_Re is a good source for investigation of non-photographic images and their metadata (similar to
Gwern Danbooru dataset) to build tools to auto-classify all of that (or simply make your eyes happy).

This release contains:

JSON metadata for 618.801 Yande_Re posts from start till 700.000 (29.10.2020) except those failed to grab (deleted posts etc)
- with simple Python script how to do it
- with pretty-printed example to illustrate structure and content
397.691 “sample” images as prepared by site with reasonable quality (introduced from ID=165352 15.12.2010)
- longer side = 1500 px but no more than 1.8 MPix
- JPEG quality 92%, some optimizations done
additional TSV (tab separated text) metadata
- key parameters of 403.933 posts, including some calculated stats
  ~ derived from JSON
  ~ computed with ImageMagick over above mentioned “samples”
- tags list, including some calculated stats and over them and external references
- tag-to-post relation as separate table - 4.261.026 rows
some database (Oracle), Batch (Windows) and Python scripts
- data structures definition
- key processing steps in database
- some query examples
- tools for stats computing
more detailed readme for DATA
BONUS 1: example of usage to make “BUST DATASET” based on Nagadomi face detector
- scripts and some description
- several (zipped BUST suffixed) folders with transformed and cleaned up “busts” (upper body)
- RAW results has to be manually filtered and arranged
- upper body detector can be built when dataset become big and clean enough
BONUS 2: example of usage with notAI-tech NudeNet tensorflow based object detector
- scripts and more description
- several (zipped NUDE suffixed) folders with marked samples
- when enough resources it can be used
  - for recheck after manual de-hentaing (as I did)
  - for semi-manual blurring to make images “less explicit”
  - for scene segmentation and person distinction based on related group of body parts

Release include “samples” only for 2/3 total posts with “good enough” images worth to get originals:

file_ext in (‘jpg’,‘png’)
greatest(image_height,image_width)>=1200 – not too small
and least(image_height,image_width)>=1000
and image_height*image_width>=1310720 – (1280x1024)
and image_width/image_height between 0.4 and 2.1 – not too disproportional
rating in (‘s’,‘q’) in separate folders/zips
- 457 explicit (evident sех, mаsturbаtiоn, pеnis) and 3033 explicit-like (mostly because pussy
  too exposed or absent) samples excluded from ‘questionable’ - that’s marked in metadata as “directories”
grabbed files renamed to contain “ID - up_to_3_copyrights ~ up_to_5_characters (up_to_2_artists)”
- tags concatenated via “+”, spaces replaced with underscores
- maximum file name length 220 symbols, characters tags may be truncated if too long
- this enables file system search and sampling (with masked XCOPY, UNZIP etc)
some gentle deduplication done (minus 2752 images), preferring ‘s’ rating and newer posts
- when no visible artistic difference but maybe technical issues
- a little bit practically blank pages throwed out
- so lots (~5000) of similarities left (that’s typical to Yande_Re)
no filter applied by score and/or tags
- it was an initial idea to include only “the best of” and exclude “banned tags”
- the border of “acceptable quality” turned out to be fuzzy
- user score vs tags vs tech-metadata may be the field of analysis

Sample images archived by 10.000 ID groups NNxxxx.[Q=questionable] NN=16…69

I recommend to use FastStone MaxView to browse images inside zips.

HERE is the same way created release for konachan.com

THERE ARE some rips on Nyaa tracker for Safebooru and Zerochan. No nipples to detect there.

File list

Yande_re_2020
- DATA
  - YJ000000.csv (1.9 MiB)
  - YJ0xxxxx.csv (143.4 MiB)
  - YJ1xxxxx.csv (147.0 MiB)
  - YJ2xxxxx.csv (151.2 MiB)
  - YJ3xxxxx.csv (164.3 MiB)
  - YJ4xxxxx.csv (167.7 MiB)
  - YJ5xxxxx.csv (168.4 MiB)
  - YJ6xxxxx.csv (167.4 MiB)
  - Y_pretty.json (3.0 KiB)
  - busts_y_16_17.csv (829.4 KiB)
  - busts_y_18_20.csv (1.7 MiB)
  - busts_y_40q.csv (352.4 KiB)
  - nude_y_60q.csv (1.9 MiB)
  - nude_y_61q.csv (1.8 MiB)
  - nude_y_64_66.csv (2.5 MiB)
  - nude_y_68q.csv (2.0 MiB)
  - nude_y_69q.csv (2.0 MiB)
  - yndr_copyr_char_tags.tsv (1.6 MiB)
  - yndr_dt.tsv (93.6 MiB)
  - yndr_pool_posts.tsv (2.3 MiB)
  - yndr_pools.tsv (269.9 KiB)
  - yndr_posts.tsv (379.9 MiB)
  - yndr_rip_ALL.tsv (299.4 MiB)
  - yndr_rip_RU.tsv (154.1 MiB)
- TOOLS
  - #IM__Y.bat (343 Bytes)
  - #IM_looY.bat (586 Bytes)
  - #bust__Y.bat (171 Bytes)
  - #bust_looY.py (1.3 KiB)
  - #nude.bat (95 Bytes)
  - #nude_loop.py (4.6 KiB)
  - #yndr.bat (937 Bytes)
  - #yndr_exif.ctl (295 Bytes)
  - #yndr_im.ctl (339 Bytes)
  - #yndr_j.ctl (120 Bytes)
  - #yndr_out.sql (4.2 KiB)
  - #yndr_out_ALL.sql (2.2 KiB)
  - $DATA_readme_ALL.txt (5.1 KiB)
  - $DATA_readme_RU.txt (3.9 KiB)
  - _yndr_ORA_DDL.sql (4.1 KiB)
  - _yndr_ORA_load.sql (4.2 KiB)
  - _yndr_ORA_make.sql (2.2 KiB)
  - _yndr_ORA_output.sql (2.2 KiB)
  - aria.bat (86 Bytes)
  - aria_urls.lst (3.8 KiB)
  - lbpcascade_animeface.xml (241.2 KiB)
  - yndr_grab_json.py (1.2 KiB)
  - yndr_grab_samples.py (1.2 KiB)
  - yndr_id.lst (336 Bytes)
- 16xxxx.q.zip (294.5 MiB)
- 16xxxx.zip (627.2 MiB)
- 17xxxx.BUSTS.zip (237.7 MiB)
- 17xxxx.q.zip (806.9 MiB)
- 17xxxx.zip (1.4 GiB)
- 18xxxx.q.zip (616.4 MiB)
- 18xxxx.zip (1.6 GiB)
- 19xxxx.q.zip (787.8 MiB)
- 19xxxx.zip (1.4 GiB)
- 20xxxx.BUSTS_RAW.zip (358.7 MiB)
- 20xxxx.q.zip (1.0 GiB)
- 20xxxx.zip (1.3 GiB)
- 21xxxx.q.zip (732.9 MiB)
- 21xxxx.zip (1.4 GiB)
- 22xxxx.q.zip (1.0 GiB)
- 22xxxx.zip (1.2 GiB)
- 23xxxx.q.zip (796.7 MiB)
- 23xxxx.zip (1.2 GiB)
- 24xxxx.q.zip (1.6 GiB)
- 24xxxx.zip (674.9 MiB)
- 25xxxx.q.zip (819.1 MiB)
- 25xxxx.zip (1.3 GiB)
- 26xxxx.q.zip (935.8 MiB)
- 26xxxx.zip (1.2 GiB)
- 27xxxx.q.zip (1.0 GiB)
- 27xxxx.zip (995.7 MiB)
- 28xxxx.q.zip (802.7 MiB)
- 28xxxx.zip (1.2 GiB)
- 29xxxx.q.zip (903.2 MiB)
- 29xxxx.zip (1.1 GiB)
- 30xxxx.q.zip (982.8 MiB)
- 30xxxx.zip (1013.0 MiB)
- 31xxxx.2.zip (1.1 GiB)
- 31xxxx.q.zip (1.1 GiB)
- 31xxxx.zip (867.5 MiB)
- 32xxxx.q.zip (1.1 GiB)
- 32xxxx.zip (1.2 GiB)
- 33xxxx.q.zip (1.1 GiB)
- 33xxxx.zip (1.4 GiB)
- 34xxxx.q.zip (1.0 GiB)
- 34xxxx.zip (1.3 GiB)
- 35xxxx.q.zip (1.1 GiB)
- 35xxxx.zip (1.3 GiB)
- 36xxxx.q.zip (956.9 MiB)
- 36xxxx.zip (1.3 GiB)
- 37xxxx.q.zip (966.1 MiB)
- 37xxxx.zip (1.5 GiB)
- 38xxxx.q.zip (793.7 MiB)
- 38xxxx.zip (1.5 GiB)
- 39xxxx.q.zip (757.6 MiB)
- 39xxxx.zip (1.6 GiB)
- 40xxxx.q.BUSTS.zip (194.5 MiB)
- 40xxxx.q.zip (837.5 MiB)
- 40xxxx.zip (1.6 GiB)
- 41xxxx.q.zip (659.4 MiB)
- 41xxxx.zip (1.7 GiB)
- 42xxxx.q.zip (883.2 MiB)
- 42xxxx.zip (1.5 GiB)
- 43xxxx.q.zip (822.7 MiB)
- 43xxxx.zip (1.5 GiB)
- 44xxxx.q.zip (880.2 MiB)
- 44xxxx.zip (1.6 GiB)
- 45xxxx.q.zip (832.5 MiB)
- 45xxxx.zip (1.4 GiB)
- 46xxxx.q.zip (951.4 MiB)
- 46xxxx.zip (1.5 GiB)
- 47xxxx.q.zip (1.1 GiB)
- 47xxxx.zip (1.3 GiB)
- 48xxxx.q.zip (1.1 GiB)
- 48xxxx.zip (1.1 GiB)
- 49xxxx.q.zip (1.1 GiB)
- 49xxxx.zip (1.2 GiB)
- 50xxxx.q.zip (1.2 GiB)
- 50xxxx.zip (1.2 GiB)
- 51xxxx.q.zip (1.5 GiB)
- 51xxxx.zip (1.0 GiB)
- 52xxxx.q.zip (1.3 GiB)
- 52xxxx.zip (1.2 GiB)
- 53xxxx.q.zip (1.3 GiB)
- 53xxxx.zip (1.0 GiB)
- 54xxxx.q.zip (1.0 GiB)
- 54xxxx.zip (1.3 GiB)
- 55xxxx.q.zip (896.4 MiB)
- 55xxxx.zip (1.4 GiB)
- 56xxxx.q.zip (1.1 GiB)
- 56xxxx.zip (1.1 GiB)
- 57xxxx.q.zip (1.1 GiB)
- 57xxxx.zip (960.0 MiB)
- 58xxxx.q.zip (1.1 GiB)
- 58xxxx.zip (1.0 GiB)
- 59xxxx.q.zip (1.2 GiB)
- 59xxxx.zip (1018.3 MiB)
- 60xxxx.q.NUDE.zip (1.8 GiB)
- 60xxxx.q.zip (1.4 GiB)
- 60xxxx.zip (883.9 MiB)
- 61xxxx.q.NUDE.zip (1.6 GiB)
- 61xxxx.q.zip (1.2 GiB)
- 61xxxx.zip (961.5 MiB)
- 62xxxx.q.zip (1.3 GiB)
- 62xxxx.zip (914.8 MiB)
- 63xxxx.q.zip (1.2 GiB)
- 63xxxx.zip (1.1 GiB)
- 64xxxx.NUDE.zip (1.3 GiB)
- 64xxxx.q.zip (1.1 GiB)
- 64xxxx.zip (1.0 GiB)
- 65xxxx.NUDE.zip (1.1 GiB)
- 65xxxx.q.zip (1.2 GiB)
- 65xxxx.zip (897.3 MiB)
- 66xxxx.NUDE.zip (1.1 GiB)
- 66xxxx.q.zip (1.3 GiB)
- 66xxxx.zip (887.8 MiB)
- 67xxxx.q.zip (1.3 GiB)
- 67xxxx.zip (897.3 MiB)
- 68xxxx.q.NUDE.zip (1.6 GiB)
- 68xxxx.q.zip (1.2 GiB)
- 68xxxx.zip (799.1 MiB)
- 69xxxx.q.NUDE.zip (1.5 GiB)
- 69xxxx.q.zip (1.1 GiB)
- 69xxxx.zip (881.7 MiB)

Yande.Re 2020 samples and metadata

File list

Comments - 0