Yande.Re 2020 samples and metadata

Category:
Date:
2021-01-03 13:38 UTC
Submitter:
Seeders:
1
Information:
No information.
Leechers:
0
File size:
132.7 GiB
Completed:
142
Info hash:
75f0b8fd1a7449d8b005ec67e645b34539affe0a
Yande.Re is one of famous anime/game/CG imageboards, devoted mostly for high quality scan - artbooks etc. It has strong community (that gives the adequate score to images) and well organized (not too verbose) tagging system. Also Yande_Re presents wide variery pictures composition and quality - from trash-like partial scans and completely text-filled artbook pages to clear art, from bad quality scans and almost-unvisible line art to pure full-color digital. That's why Yande_Re is a good source for investigation of non-photographic images and their metadata (similar to Gwern Danbooru dataset) to build tools to auto-classify all of that (or simply make your eyes happy). This release contains: - JSON metadata for 618.801 Yande_Re posts from start till 700.000 (29.10.2020) except those failed to grab (deleted posts etc) * with simple Python script how to do it * with pretty-printed example to illustrate structure and content - 397.691 "sample" images as prepared by site with reasonable quality (introduced from ID=165352 15.12.2010) * longer side = 1500 px but no more than 1.8 MPix * JPEG quality 92%, some optimizations done - additional TSV (tab separated text) metadata * key parameters of 403.933 posts, including some calculated stats ~ derived from JSON ~ computed with ImageMagick over above mentioned "samples" * tags list, including some calculated stats and over them and external references * tag-to-post relation as separate table - 4.261.026 rows - some database (Oracle), Batch (Windows) and Python scripts * data structures definition * key processing steps in database * some query examples * tools for stats computing - more detailed readme for DATA - BONUS 1: example of usage to make "BUST DATASET" based on [Nagadomi face detector](https://github.com/nagadomi/lbpcascade_animeface) - scripts and some description - several (zipped BUST suffixed) folders with transformed and cleaned up "busts" (upper body) - RAW results has to be manually filtered and arranged - upper body detector can be built when dataset become big and clean enough - BONUS 2: example of usage with [notAI-tech NudeNet](https://github.com/notAI-tech/NudeNet) tensorflow based object detector - scripts and more description - several (zipped NUDE suffixed) folders with marked samples - when enough resources it can be used * for recheck after manual de-hentaing (as I did) * for semi-manual blurring to make images "less explicit" * [for scene segmentation and person distinction based on related group of body parts](https://www.kaggle.com/printcraft/anime-and-cg-characters-detection-using-yolov5) Release include "samples" only for 2/3 total posts with "good enough" images worth to get originals: - file_ext in ('jpg','png') - greatest(image_height,image_width)>=1200 -- not too small and least(image_height,image_width)>=1000 and image_height*image_width>=1310720 -- (1280x1024) and image_width/image_height between 0.4 and 2.1 -- not too disproportional - rating in ('s','q') in separate folders/zips * 457 explicit (evident sех, mаsturbаtiоn, pеnis) and 3033 explicit-like (mostly because pussy too exposed or absent) samples excluded from 'questionable' - that's marked in metadata as "directories" - grabbed files renamed to contain "ID - up_to_3_copyrights ~ up_to_5_characters (up_to_2_artists)" * tags concatenated via "+", spaces replaced with underscores * maximum file name length 220 symbols, characters tags may be truncated if too long * this enables file system search and sampling (with masked XCOPY, UNZIP etc) - some gentle deduplication done (minus 2752 images), preferring 's' rating and newer posts * when no visible artistic difference but maybe technical issues * a little bit practically blank pages throwed out * so lots (~5000) of similarities left (that's typical to Yande_Re) - no filter applied by score and/or tags * it was an initial idea to include only "the best of" and exclude "banned tags" * the border of "acceptable quality" turned out to be fuzzy * user score vs tags vs tech-metadata may be the field of analysis Sample images archived by 10.000 ID groups NNxxxx.[Q=questionable] NN=16..69 I recommend to use FastStone MaxView to browse images inside zips. [HERE](https://sukebei.nyaa.si/view/3204613) is the same way created release for konachan.com [THERE ARE](https://nyaa.si/user/AlexPUA) some rips on Nyaa tracker for Safebooru and Zerochan. No nipples to detect there.

File list

  • Yande_re_2020
    • DATA
      • YJ000000.csv (1.9 MiB)
      • YJ0xxxxx.csv (143.4 MiB)
      • YJ1xxxxx.csv (147.0 MiB)
      • YJ2xxxxx.csv (151.2 MiB)
      • YJ3xxxxx.csv (164.3 MiB)
      • YJ4xxxxx.csv (167.7 MiB)
      • YJ5xxxxx.csv (168.4 MiB)
      • YJ6xxxxx.csv (167.4 MiB)
      • Y_pretty.json (3.0 KiB)
      • busts_y_16_17.csv (829.4 KiB)
      • busts_y_18_20.csv (1.7 MiB)
      • busts_y_40q.csv (352.4 KiB)
      • nude_y_60q.csv (1.9 MiB)
      • nude_y_61q.csv (1.8 MiB)
      • nude_y_64_66.csv (2.5 MiB)
      • nude_y_68q.csv (2.0 MiB)
      • nude_y_69q.csv (2.0 MiB)
      • yndr_copyr_char_tags.tsv (1.6 MiB)
      • yndr_dt.tsv (93.6 MiB)
      • yndr_pool_posts.tsv (2.3 MiB)
      • yndr_pools.tsv (269.9 KiB)
      • yndr_posts.tsv (379.9 MiB)
      • yndr_rip_ALL.tsv (299.4 MiB)
      • yndr_rip_RU.tsv (154.1 MiB)
    • TOOLS
      • #IM__Y.bat (343 Bytes)
      • #IM_looY.bat (586 Bytes)
      • #bust__Y.bat (171 Bytes)
      • #bust_looY.py (1.3 KiB)
      • #nude.bat (95 Bytes)
      • #nude_loop.py (4.6 KiB)
      • #yndr.bat (937 Bytes)
      • #yndr_exif.ctl (295 Bytes)
      • #yndr_im.ctl (339 Bytes)
      • #yndr_j.ctl (120 Bytes)
      • #yndr_out.sql (4.2 KiB)
      • #yndr_out_ALL.sql (2.2 KiB)
      • $DATA_readme_ALL.txt (5.1 KiB)
      • $DATA_readme_RU.txt (3.9 KiB)
      • _yndr_ORA_DDL.sql (4.1 KiB)
      • _yndr_ORA_load.sql (4.2 KiB)
      • _yndr_ORA_make.sql (2.2 KiB)
      • _yndr_ORA_output.sql (2.2 KiB)
      • aria.bat (86 Bytes)
      • aria_urls.lst (3.8 KiB)
      • lbpcascade_animeface.xml (241.2 KiB)
      • yndr_grab_json.py (1.2 KiB)
      • yndr_grab_samples.py (1.2 KiB)
      • yndr_id.lst (336 Bytes)
    • 16xxxx.q.zip (294.5 MiB)
    • 16xxxx.zip (627.2 MiB)
    • 17xxxx.BUSTS.zip (237.7 MiB)
    • 17xxxx.q.zip (806.9 MiB)
    • 17xxxx.zip (1.4 GiB)
    • 18xxxx.q.zip (616.4 MiB)
    • 18xxxx.zip (1.6 GiB)
    • 19xxxx.q.zip (787.8 MiB)
    • 19xxxx.zip (1.4 GiB)
    • 20xxxx.BUSTS_RAW.zip (358.7 MiB)
    • 20xxxx.q.zip (1.0 GiB)
    • 20xxxx.zip (1.3 GiB)
    • 21xxxx.q.zip (732.9 MiB)
    • 21xxxx.zip (1.4 GiB)
    • 22xxxx.q.zip (1.0 GiB)
    • 22xxxx.zip (1.2 GiB)
    • 23xxxx.q.zip (796.7 MiB)
    • 23xxxx.zip (1.2 GiB)
    • 24xxxx.q.zip (1.6 GiB)
    • 24xxxx.zip (674.9 MiB)
    • 25xxxx.q.zip (819.1 MiB)
    • 25xxxx.zip (1.3 GiB)
    • 26xxxx.q.zip (935.8 MiB)
    • 26xxxx.zip (1.2 GiB)
    • 27xxxx.q.zip (1.0 GiB)
    • 27xxxx.zip (995.7 MiB)
    • 28xxxx.q.zip (802.7 MiB)
    • 28xxxx.zip (1.2 GiB)
    • 29xxxx.q.zip (903.2 MiB)
    • 29xxxx.zip (1.1 GiB)
    • 30xxxx.q.zip (982.8 MiB)
    • 30xxxx.zip (1013.0 MiB)
    • 31xxxx.2.zip (1.1 GiB)
    • 31xxxx.q.zip (1.1 GiB)
    • 31xxxx.zip (867.5 MiB)
    • 32xxxx.q.zip (1.1 GiB)
    • 32xxxx.zip (1.2 GiB)
    • 33xxxx.q.zip (1.1 GiB)
    • 33xxxx.zip (1.4 GiB)
    • 34xxxx.q.zip (1.0 GiB)
    • 34xxxx.zip (1.3 GiB)
    • 35xxxx.q.zip (1.1 GiB)
    • 35xxxx.zip (1.3 GiB)
    • 36xxxx.q.zip (956.9 MiB)
    • 36xxxx.zip (1.3 GiB)
    • 37xxxx.q.zip (966.1 MiB)
    • 37xxxx.zip (1.5 GiB)
    • 38xxxx.q.zip (793.7 MiB)
    • 38xxxx.zip (1.5 GiB)
    • 39xxxx.q.zip (757.6 MiB)
    • 39xxxx.zip (1.6 GiB)
    • 40xxxx.q.BUSTS.zip (194.5 MiB)
    • 40xxxx.q.zip (837.5 MiB)
    • 40xxxx.zip (1.6 GiB)
    • 41xxxx.q.zip (659.4 MiB)
    • 41xxxx.zip (1.7 GiB)
    • 42xxxx.q.zip (883.2 MiB)
    • 42xxxx.zip (1.5 GiB)
    • 43xxxx.q.zip (822.7 MiB)
    • 43xxxx.zip (1.5 GiB)
    • 44xxxx.q.zip (880.2 MiB)
    • 44xxxx.zip (1.6 GiB)
    • 45xxxx.q.zip (832.5 MiB)
    • 45xxxx.zip (1.4 GiB)
    • 46xxxx.q.zip (951.4 MiB)
    • 46xxxx.zip (1.5 GiB)
    • 47xxxx.q.zip (1.1 GiB)
    • 47xxxx.zip (1.3 GiB)
    • 48xxxx.q.zip (1.1 GiB)
    • 48xxxx.zip (1.1 GiB)
    • 49xxxx.q.zip (1.1 GiB)
    • 49xxxx.zip (1.2 GiB)
    • 50xxxx.q.zip (1.2 GiB)
    • 50xxxx.zip (1.2 GiB)
    • 51xxxx.q.zip (1.5 GiB)
    • 51xxxx.zip (1.0 GiB)
    • 52xxxx.q.zip (1.3 GiB)
    • 52xxxx.zip (1.2 GiB)
    • 53xxxx.q.zip (1.3 GiB)
    • 53xxxx.zip (1.0 GiB)
    • 54xxxx.q.zip (1.0 GiB)
    • 54xxxx.zip (1.3 GiB)
    • 55xxxx.q.zip (896.4 MiB)
    • 55xxxx.zip (1.4 GiB)
    • 56xxxx.q.zip (1.1 GiB)
    • 56xxxx.zip (1.1 GiB)
    • 57xxxx.q.zip (1.1 GiB)
    • 57xxxx.zip (960.0 MiB)
    • 58xxxx.q.zip (1.1 GiB)
    • 58xxxx.zip (1.0 GiB)
    • 59xxxx.q.zip (1.2 GiB)
    • 59xxxx.zip (1018.3 MiB)
    • 60xxxx.q.NUDE.zip (1.8 GiB)
    • 60xxxx.q.zip (1.4 GiB)
    • 60xxxx.zip (883.9 MiB)
    • 61xxxx.q.NUDE.zip (1.6 GiB)
    • 61xxxx.q.zip (1.2 GiB)
    • 61xxxx.zip (961.5 MiB)
    • 62xxxx.q.zip (1.3 GiB)
    • 62xxxx.zip (914.8 MiB)
    • 63xxxx.q.zip (1.2 GiB)
    • 63xxxx.zip (1.1 GiB)
    • 64xxxx.NUDE.zip (1.3 GiB)
    • 64xxxx.q.zip (1.1 GiB)
    • 64xxxx.zip (1.0 GiB)
    • 65xxxx.NUDE.zip (1.1 GiB)
    • 65xxxx.q.zip (1.2 GiB)
    • 65xxxx.zip (897.3 MiB)
    • 66xxxx.NUDE.zip (1.1 GiB)
    • 66xxxx.q.zip (1.3 GiB)
    • 66xxxx.zip (887.8 MiB)
    • 67xxxx.q.zip (1.3 GiB)
    • 67xxxx.zip (897.3 MiB)
    • 68xxxx.q.NUDE.zip (1.6 GiB)
    • 68xxxx.q.zip (1.2 GiB)
    • 68xxxx.zip (799.1 MiB)
    • 69xxxx.q.NUDE.zip (1.5 GiB)
    • 69xxxx.q.zip (1.1 GiB)
    • 69xxxx.zip (881.7 MiB)