Detecting Text(OCR) in a ScreenShot of Hentai game Using Google Cloud Vision

 This article is poor translation of my previous article in japanese.

 There is a article(in japanese) tried text detection for SS(ScreenShot) of hentai game(visual novel). I have tried same thing, but I didn't get good results.I think this results caused by rate of permeability of text box. It is usual text box placed on a background image, raising rate of permeability makes non-uniform noise for text detection.
 There are many OCR software or API, Tesseract, Google Cloud Vision, and so on. In this article, I use Google Cloud Vision. I tried Tesseract 3.x too, but it did not go well. *1
 Skipping the details, I will show the results.
f:id:youryouryour:20170307113156j:plain
This image detected

CHAPTER 2-4IR 11【 芳乃/気の緩みは怪我の元です。ぽくないと思います

. This result is good.
f:id:youryouryour:20170307113909j:plain
Next image is only detected

0O

.


f:id:youryouryour:20170307114113j:plain
Next image is detected

St1-E中庭3むコクナプラネタリウム鑑賞会VALプ一緒に星空みませんか?0天文同好会主催3(をこ!?テニス部部員募集中:部 lito0者大歓迎「あ のiEANの認はあります扉を開ましょうAl ...ダンス部.カ中夜祭

. This result show us that out of text box region is also detected.

I used R language, but you can use Python etc too. The R code is referenced from d.hatena.ne.jp. The code makes OCR to SS which uploaded to imgur album.*2 You can change easily that you can recognize your local image.
At first time, you should once run following code.

install.packages("httr")
install.packages("base64enc")
install.packages("imguR")

The OCR code is below.

rm(list=ls())
#add size to argument
getResult <- function(f, type = "TEXT_DETECTION",size){
  library("httr")
  library("base64enc")
  CROWD_VISION_KEY <- "********************" #your api key
  u <- paste0("https://vision.googleapis.com/v1/images:annotate?key=", CROWD_VISION_KEY)
  img <- readBin(f, "raw", size)
  base64_encoded <- base64encode(img)
  body <- list(requests = list(image = list(content = base64_encoded),
                               features = list(type = type,
                                               maxResults = 5),
                               imageContext= list(languageHints = "ja"))
  )
  
  res <- POST(url = u,
              encode = "json",
              body = body,
              content_type_json())
}


library(imguR)
user_name <- '*********' #your imgur username
tkn <- imgur_login()
if(!account_verified(token = tkn))
  send_verification(token = tkn)
account(token = tkn)
album<-get_album("*****") #your album url http://imgur.com/a/*****
album_title<-album$title
imagesNumber <- album$images_count
for(i in 1:imagesNumber){
  #imageURL <- c(imageURL,album$images[[i]]$link)
  filename <-album$images[[i]]$link
  size <- album$images[[i]]$size
  res <- getResult(filename, "TEXT_DETECTION",size)
  textbox<-content(res)$responses[[1]]$textAnnotations[[1]]$description
  textbox <- gsub("\n","",textbox)
  temp <- c(filename,textbox,album_title)
  write.table(x = t(temp), file = "imgur_ocr.csv", col.names=FALSE,sep = ",", append = T)
}


If you are beginner for programming, following instruction may be helpful.

  1. install R language
  2. install RStudio
  3. Getting API key for Cloud Vision
  4. upload SS to imgur and register to album

*3
*4

*1:Tesseract 4.0 alpha is released, whose OCR engine based on Long short-term memory(LSTM) neural network.

*2:just convenient for me

*3:Detecting only biggest text region(opencv and so on) may be useful.

*4:Using OCR 7336 images by CloudVision cost me 1066 yen, but I am in free trial period.