over 4 years ago

今天我們來看看nltk.Tree要怎麼用

先載入模組

>>> from nltk import Tree

1.build syntax tree

舉個例子
Gary plays baseball
此句子的剖析樹(syntax tree)是這樣:
nltk1.png

用nltk.Tree來建構tree:

>>> tree=Tree('S',[Tree('NP',['Gary']),
...           Tree('VP',[Tree('VT',['plays']),
...                     Tree('NP',['baseball'])])])
>>> tree.draw()

即可將剖析樹畫出來
若沒安裝 python-tk,你也可以這樣把tree印出

>>> tree.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'

2. Subtrees ,Nodes and Leaves

再來可以對tree進行操作
例如:取得tree的subtree, node 和leaf

>>> tree[1]
Tree('VP', [Tree('VT', ['plays']), Tree('NP', ['baseball'])])
>>> tree[1].node
'VP'
>>> tree[1,1]
Tree('NP', ['baseball'])
>>> tree[1,1].node
'NP'
>>> tree[1,1,0]
'baseball'
>>> tree.leaves()
['Gary', 'plays', 'baseball']

3. Grammar , Chomsky normal form

我們也可以看看這個tree是由哪些grammar產生的
可以用productions()得出grammar

>>> tree.productions()
[S -> NP VP, NP -> 'Gary', VP -> VT NP, VT -> 'plays', NP -> 'baseball']

再來,我們可以把grammar轉成chomsky normal form(CNF)
(若不清楚CNF是什麼,請查閱"計算理論"的教科書)
首先,看看一個不符合CNF的例子:

>>> tree2= Tree('S', [ Tree('NP', ['Gary']),
...                     Tree('VT', ['play']), 
...                     Tree('NP', ['baeball'])])
>>> tree2.productions()
[S -> NP VT NP, NP -> 'Gary', VT -> 'play', NP -> 'baeball']
>>> tree2.draw()

nltk2.png

S -> NP VT NP不符合chomsky normal form
可以用chomsky_normal_form()

>>> tree2.chomsky_normal_form()
>>> tree2.productions()
[S -> NP S|<VT-NP>, NP -> 'Gary', S|<VT-NP> -> VT NP, VT -> 'play', NP -> 'baeball']
>>> tree2.draw()

nltk3.png
轉換後產生了一個新的node,S|<VT-NP>,這樣子就符合CNF了

4.Parse tree from string

如果我們現在有一個string如下:

>>> s=r"(S (NP Gary) (VP (VT plays) (NP baseball)))"

要從這個string建立出tree,可用以下方法:

>>> tree3=Tree.parse(s)
>>> tree3.pprint()
'(S (NP Gary) (VP (VT plays) (NP baseball)))'

結語:

想要看更多參考資料,請到:
tutorial:
https://nltk.googlecode.com/svn/trunk/doc/howto/tree.html
http://www.mit.edu/~6.863/spring2011/labs/nltk-tree-pages.pdf
api documentation:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tree.Tree-class.html#productions
source code:
http://www.nltk.org/_modules/nltk/tree.html

← Python -- List Comprehension Python nltk -- Sinica Treebank →
 
comments powered by Disqus