CC-CEDICT
Chinese dictionary data is derived from CC-CEDICT and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license .
Source: MDBG / CC-CEDICT
HanyuGuide uses open-source software and several licensed language datasets. This page collects the notices that apply to the dictionary, character metadata, stroke-order assets, and example sentence sources.
Chinese dictionary data is derived from CC-CEDICT and is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license .
Source: MDBG / CC-CEDICT
Character decomposition, radical, and etymology-hint fields are derived from Make Me a Hanzi and are provided under the GNU Lesser General Public License v3.
Source: skishore/makemeahanzi
Stroke-order paths and median-line data are derived from Make Me a Hanzi stroke assets, which in turn are derived from Arphic public-license fonts and remain subject to the Arphic Public License.
Source: skishore/makemeahanzi
Additional character readings, variant references, and Unicode definitions are derived from the Unicode Unihan Database and are used under the Unicode License.
Source: unicode.org/charts/unihan.html
The browser Chinese-to-pinyin tool uses the pinyin-pro runtime library, which is licensed under the MIT License. The public pinyin conversion API uses a server-side PHP converter backed by an imported dictionary table. HanyuGuide also hosts a first-party public JSON dictionary export derived from the official @pinyin-pro/data `complete.json` package data and imports that export into the server-side conversion table.
The upstream `complete.json` data source is described by the `pinyin-pro-data` project as a pinyin collection built from the jieba Chinese segmentation dictionary. HanyuGuide does not rely on the package's separate `modern.json` dataset for this public export.
Sources: zh-lx/pinyin-pro , chinese-data/pinyin-pro-data, fxsjy/jieba, and dictionary.hanyuguide.com/complete.json
Word frequency ranks are derived from SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles by Qing Cai and Marc Brysbaert, published in PLOS ONE. The dataset is licensed under the Creative Commons Attribution 4.0 International license .
Sources: PLOS ONE article, supporting frequency files, and Figshare dataset mirror
HSK level labels and official HSK study-list membership are derived from the CTI / Chinese Test HSK 3.0 examination syllabus published in November 2025 and effective July 2026.
Source: Chinese Test HSK page
HanyuGuide also generates learner-facing dictionary enrichments such as translations, usage notes, semantic relations, and some example sentences with configured AI providers. Those generated outputs are stored as first-party application content.
Some legacy dictionary example sentences may still be sourced from Tatoeba and remain subject to CC-BY 2.0 FR . Those legacy entries are being phased out as HanyuGuide continues replacing them with first-party generated examples. HanyuGuide stores the source URL and external sentence ID for any remaining Tatoeba-backed entries so each retained example can link back to its original Tatoeba page for contributor attribution, and also uses AI-generated sentences where available.
Source: tatoeba.org
Beyond the dictionary-specific sources above, HanyuGuide depends on open-source packages from the Laravel, PHP, JavaScript, and Expo ecosystems. Their individual licenses remain in effect for the packaged software distributed with the application.