python/pyocr: init at 0.4.4

This package is a bit more involved because it assumes a lot of paths
being there in a FHS compliant way, so we need to patch the data and
binary directories for Tesseract and Cuneiform.

I've also tried to get the tests working, but they produce different
results comparing input/output. This is probably related to the
following issue:

https://github.com/jflesch/pyocr/issues/52

So I've disabled certain tests that fail but don't generally impede the
functionality of pyocr.

Tested by building against Python 3.3, 3.4, 3.5 and 3.6.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This commit is contained in:
aszlig
2016-12-19 15:23:49 +01:00
parent 02a9da65c9
commit e186a8dba9

View File

@@ -20538,6 +20538,64 @@ in {
};
};
pyocr = buildPythonPackage rec {
name = "pyocr-${version}";
version = "0.4.4";
# Don't fetch from PYPI because it doesn't contain tests.
src = pkgs.fetchFromGitHub {
owner = "jflesch";
repo = "pyocr";
rev = version;
sha256 = "09s7dxin8ams0f3xab60f45l3nn236a8win9yfyq9aqy9mm946ak";
};
postPatch = ''
sed -i \
-e 's,^\(TESSERACT_CMD *= *\).*,\1"${pkgs.tesseract}/bin/tesseract",' \
-e 's,^\(CUNEIFORM_CMD *= *\).*,\1"${pkgs.cuneiform}/bin/cuneiform",' \
-e '/^CUNIFORM_POSSIBLE_PATHS *= *\[/,/^\]$/ {
c CUNIFORM_POSSIBLE_PATHS = ["${pkgs.cuneiform}/share/cuneiform"]
}' src/pyocr/{tesseract,cuneiform}.py
sed -i -r \
-e 's,"libtesseract\.so\.3","${pkgs.tesseract}/lib/libtesseract.so",' \
-e 's,^(TESSDATA_PREFIX *=).*,\1 "${pkgs.tesseract}/share/tessdata",' \
src/pyocr/libtesseract/tesseract_raw.py
# Disable specific tests that are probably failing because of this issue:
# https://github.com/jflesch/pyocr/issues/52
for test in $disabledTests; do
file="''${test%%:*}"
fun="''${test#*:}"
echo "$fun = unittest.expectedFailure($fun)" >> "tests/tests_$file.py"
done
'';
disabledTests = [
"cuneiform:TestTxt.test_basic"
"cuneiform:TestTxt.test_european"
"cuneiform:TestTxt.test_french"
"cuneiform:TestWordBox.test_basic"
"cuneiform:TestWordBox.test_european"
"cuneiform:TestWordBox.test_french"
"libtesseract:TestBasicDoc.test_basic"
"libtesseract:TestDigitLineBox.test_digits"
"libtesseract:TestLineBox.test_japanese"
"libtesseract:TestTxt.test_japanese"
"libtesseract:TestWordBox.test_japanese"
"tesseract:TestDigitLineBox.test_digits"
"tesseract:TestTxt.test_japanese"
];
propagatedBuildInputs = [ self.pillow self.six ];
meta = {
homepage = "https://github.com/jflesch/pyocr";
description = "A Python wrapper for Tesseract and Cuneiform";
license = licenses.gpl3Plus;
};
};
pyparsing = buildPythonPackage rec {
name = "pyparsing-${version}";