Open Source Credits

HanyuGuide uses open-source software and several licensed language datasets. This page collects the notices that apply to the dictionary, character metadata, stroke-order assets, and example sentence sources.

Dictionary Data

CC-CEDICT

Chinese dictionary data is derived from CC-CEDICT and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license .

Source: MDBG / CC-CEDICT

Make Me a Hanzi Dictionary Metadata

Character decomposition, radical, and etymology-hint fields are derived from Make Me a Hanzi and are provided under the GNU Lesser General Public License v3.

Source: skishore/makemeahanzi

Make Me a Hanzi Stroke Graphics

Stroke-order paths and median-line data are derived from Make Me a Hanzi stroke assets, which in turn are derived from Arphic public-license fonts and remain subject to the Arphic Public License.

Source: skishore/makemeahanzi

Unicode Unihan Database

Additional character readings, variant references, and Unicode definitions are derived from the Unicode Unihan Database and are used under the Unicode License.

Source: unicode.org/charts/unihan.html

Pinyin Conversion Runtime

The browser Chinese-to-pinyin tool uses the pinyin-pro runtime library, which is licensed under the MIT License. The public pinyin conversion API uses a server-side PHP converter backed by an imported dictionary table. HanyuGuide also hosts a first-party public JSON dictionary export derived from the official @pinyin-pro/data `complete.json` package data and imports that export into the server-side conversion table.

The upstream `complete.json` data source is described by the `pinyin-pro-data` project as a pinyin collection built from the jieba Chinese segmentation dictionary. HanyuGuide does not rely on the package's separate `modern.json` dataset for this public export.

Sources: zh-lx/pinyin-pro , chinese-data/pinyin-pro-data, fxsjy/jieba, and dictionary.hanyuguide.com/complete.json

SUBTLEX-CH Frequency Data

Word frequency ranks are derived from SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles by Qing Cai and Marc Brysbaert, published in PLOS ONE. The dataset is licensed under the Creative Commons Attribution 4.0 International license . HanyuGuide normalizes the published word-frequency table and maps ranks into learner-facing subtitle-frequency labels.

Sources: PLOS ONE article, supporting frequency files, and Figshare dataset mirror

Official HSK 3.0 Syllabus

HSK level labels and official HSK study-list membership are derived from the CTI / Chinese Test HSK 3.0 examination syllabus published in November 2025 and effective July 2026.

Source: Chinese Test HSK page

Generated Content

HanyuGuide also generates learner-facing dictionary enrichments such as translations, usage notes, semantic relations, and some example sentences with configured AI providers. Those generated outputs are stored as first-party application content.

Example Sentences

Some legacy dictionary example sentences may still be sourced from Tatoeba and remain subject to CC-BY 2.0 FR . Those legacy entries are being phased out as HanyuGuide continues replacing them with first-party generated examples. HanyuGuide stores the source URL and external sentence ID for any remaining Tatoeba-backed entries so each retained example can link back to its original Tatoeba page for contributor attribution, and also uses AI-generated sentences where available.

Source: tatoeba.org

Software Libraries

Beyond the dictionary-specific sources above, HanyuGuide depends on open-source packages from the Laravel, PHP, JavaScript, and Expo ecosystems. Their individual licenses remain in effect for the packaged software distributed with the application.