Skip to content

Home

String Pod

pypi python Build Status codecov

Matching texts across languages

Features

  • Normalize text with options
  • Check if a text contains a substring
  • Parse numbers from text
  • Compare pinyin of two texts

Usage

Contains

Check if a text contains a substring, with options.

stringpod contains "Hello, world!" "world"
stringpod contains "  Hello, world!  " "lo, wor" --options "strip_whitespace,ignore_case"
stringpod contains "歌曲(純音樂)" "(纯音乐)" --options "ignore_chinese_variant"

Normalize

Normalize text to a standard form.

stringpod normalize "Hello, World!!!"
stringpod normalize "    Hello,   World!!!" --options "all"
stringpod normalize "歌曲(純音樂)" --options "ignore_chinese_variant"

Normalizer Options

  • strip_whitespace: Strip whitespace (leading and trailing) from the text (default: False)
  • remove_whitespace: Remove whitespace (all whitespace characters) from the text (default: False)
  • strip_whitespace will not be needed if remove_whitespace is True
  • ignore_chinese_variant: Ignore Chinese variant (default: False)
  • ignore_case: Ignore case (default: False)
  • English will be converted to lowercase
  • Chinese will be converted to simplified Chinese
  • nfkc: Normalize to NFKC (default: True)

Number Parser

Parse numbers from text.

stringpod number "One hundred and twenty-three"
stringpod number "One hundred and twenty-three" --language "en"

Number Parser Options

  • language: Language of the number (default: en)

Compare Pinyin

Compare pinyin of two texts.

stringpod cmp-pinyin "你好" "你号"
stringpod cmp-pinyin "你好" "你号" --options "with_tone"
stringpod cmp-pinyin "你好" "你号" --options "spoken_tone"

Pinyin Options

  • with_tone: Whether to include the tone (default: False)
  • spoken_tone: Whether to use the spoken tone (default: False)

Development

poetry install -E dev -E docs -E test
poetry run pre-commit install

CLI Application

poetry run python -m stringpod.cli --help

Python API

poetry run python -m stringpod.stringpod --help

Testing

poetry run pytest # Run Pytest
poetry run python -m stringpod.stringpod -v # Run Doctests

Credits

Core packages:

This package was created with Cookiecutter and the waynerv/cookiecutter-pypackage project template.

License

This project is licensed under the MIT License - see the LICENSE file for details.