当前位置: 首页>>代码示例>>C++>>正文


C++ UNICHARSET::encodable_string方法代码示例

本文整理汇总了C++中UNICHARSET::encodable_string方法的典型用法代码示例。如果您正苦于以下问题:C++ UNICHARSET::encodable_string方法的具体用法?C++ UNICHARSET::encodable_string怎么用?C++ UNICHARSET::encodable_string使用的例子?那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在UNICHARSET的用法示例。


在下文中一共展示了UNICHARSET::encodable_string方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的C++代码示例。

示例1: Main


//.........这里部分代码省略.........
    const unsigned int kCharsPerLine = (FLAGS_ptsize > 20) ? 50 : 100;
    std::string rand_utf8;
    UNICHARSET unicharset;
    if (FLAGS_render_ngrams && !FLAGS_unicharset_file.empty() &&
        !unicharset.load_from_file(FLAGS_unicharset_file.c_str())) {
      tprintf("Failed to load unicharset from file %s\n",
              FLAGS_unicharset_file.c_str());
      exit(1);
    }

    // If we are rendering ngrams that will be OCRed later, shuffle them so that
    // tesseract does not have difficulties finding correct baseline, word
    // spaces, etc.
    const char *str8 = src_utf8.c_str();
    int len = src_utf8.length();
    int step;
    std::vector<std::pair<int, int> > offsets;
    int offset = SpanUTF8Whitespace(str8);
    while (offset < len) {
      step = SpanUTF8NotWhitespace(str8 + offset);
      offsets.push_back(std::make_pair(offset, step));
      offset += step;
      offset += SpanUTF8Whitespace(str8 + offset);
    }
    if (FLAGS_render_ngrams)
      std::random_shuffle(offsets.begin(), offsets.end());

    for (size_t i = 0, line = 1; i < offsets.size(); ++i) {
      const char *curr_pos = str8 + offsets[i].first;
      int ngram_len = offsets[i].second;
      // Skip words that contain characters not in found in unicharset.
      std::string cleaned = UNICHARSET::CleanupString(curr_pos, ngram_len);
      if (!FLAGS_unicharset_file.empty() &&
          !unicharset.encodable_string(cleaned.c_str(), nullptr)) {
        continue;
      }
      rand_utf8.append(curr_pos, ngram_len);
      if (rand_utf8.length() > line * kCharsPerLine) {
        rand_utf8.append(" \n");
        ++line;
        if (line & 0x1) rand_utf8.append(kSeparator);
      } else {
        rand_utf8.append(kSeparator);
      }
    }
    tlog(1, "Rendered ngram string of size %d\n", rand_utf8.length());
    src_utf8.swap(rand_utf8);
  }
  if (FLAGS_only_extract_font_properties) {
    tprintf("Extracting font properties only\n");
    ExtractFontProperties(src_utf8, &render, FLAGS_outputbase.c_str());
    tprintf("Done!\n");
    return 0;
  }

  int im = 0;
  std::vector<float> page_rotation;
  const char* to_render_utf8 = src_utf8.c_str();

  tesseract::TRand randomizer;
  randomizer.set_seed(kRandomSeed);
  std::vector<std::string> font_names;
  // We use a two pass mechanism to rotate images in both direction.
  // The first pass(0) will rotate the images in random directions and
  // the second pass(1) will mirror those rotations.
  int num_pass = FLAGS_bidirectional_rotation ? 2 : 1;
开发者ID:jan-ruzicka,项目名称:tesseract,代码行数:67,代码来源:text2image.cpp


注:本文中的UNICHARSET::encodable_string方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。