今天我們來看看nltk.Tree要怎麼用
先載入模組
>>> from nltk import Tree
1.build syntax tree
舉個例子
Gary plays baseball
此句子的剖析樹(syntax tree)是這樣:
用nltk.Tree來建構tree:
>>> tree=Tree('S',[Tree('NP',['Gary']),
... Tree('VP',[Tree('VT',['plays']),
... Tree('NP',['baseball'])])])
>>> tree.draw()
即可將剖析樹畫出來
若沒安裝 python-tk,你也可以這樣把tree印出
>>> tree.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'
2. Subtrees ,Nodes and Leaves
再來可以對tree進行操作
例如:取得tree的subtree, node 和leaf
>>> tree[1]
Tree('VP', [Tree('VT', ['plays']), Tree('NP', ['baseball'])])
>>> tree[1].node
'VP'
>>> tree[1,1]
Tree('NP', ['baseball'])
>>> tree[1,1].node
'NP'
>>> tree[1,1,0]
'baseball'
>>> tree.leaves()
['Gary', 'plays', 'baseball']
3. Grammar , Chomsky normal form
我們也可以看看這個tree是由哪些grammar產生的
可以用productions()
得出grammar
>>> tree.productions()
[S -> NP VP, NP -> 'Gary', VP -> VT NP, VT -> 'plays', NP -> 'baseball']
再來,我們可以把grammar轉成chomsky normal form(CNF)
(若不清楚CNF是什麼,請查閱"計算理論"的教科書)
首先,看看一個不符合CNF的例子:
>>> tree2= Tree('S', [ Tree('NP', ['Gary']),
... Tree('VT', ['play']),
... Tree('NP', ['baeball'])])
>>> tree2.productions()
[S -> NP VT NP, NP -> 'Gary', VT -> 'play', NP -> 'baeball']
>>> tree2.draw()
S -> NP VT NP
不符合chomsky normal form
可以用chomsky_normal_form()
>>> tree2.chomsky_normal_form()
>>> tree2.productions()
[S -> NP S|<VT-NP>, NP -> 'Gary', S|<VT-NP> -> VT NP, VT -> 'play', NP -> 'baeball']
>>> tree2.draw()
轉換後產生了一個新的node,S|<VT-NP>
,這樣子就符合CNF了
4.Parse tree from string
如果我們現在有一個string如下:
>>> s=r"(S (NP Gary) (VP (VT plays) (NP baseball)))"
要從這個string建立出tree,可用以下方法:
>>> tree3=Tree.parse(s)
>>> tree3.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'
結語:
想要看更多參考資料,請到:
tutorial:
https://nltk.googlecode.com/svn/trunk/doc/howto/tree.html
http://www.mit.edu/~6.863/spring2011/labs/nltk-tree-pages.pdf
api documentation:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tree.Tree-class.html#productions
source code:
http://www.nltk.org/_modules/nltk/tree.html