Struttura di un documento xml

Elementi e attributi

Un documento xml è composto di elementi strutturati come segue:

<tipo_elemento>...</tipo_elemento>

A un elemento possono essere associati dei valori determinati di specifici attributi secondo la sintassi seguente:

<tipo_elemento nome_attributo1="valore_attributo1" nome_attributo2="valore_attributo2"...>...</tipo_elemento>

Un elemento può contenere del testo.

<tipo_elemento>testo</tipo_elemento>

Un elemento può contenere altri elementi.

<tipo_elemento>
    <tipo_elemento>...</tipo_elemento>
    <tipo_elemento>...</tipo_elemento>
    ...
</tipo_elemento>

Un elemento può non contenere nulla.

<tipo_elemento/>

Esempi tratti da corpora annotati

Un esempio dal Latin Dependency Treebank.

Id fieri posse, si suas copias Haedui in fines Bellovacorum introduxerint et eorum agros populari coeperint. His datis mandatis eum a se dimittit.

<treebank>
    [...]
    <sentence id="17" document_id="Perseus:text:1999.02.0002" subdoc="Book=2:chapter=5" span="Id0:coeperint0">
        <word id="1" form="Id" lemma="is1" postag="p-s---na-" head="3" relation="SBJ" />
        <word id="2" form="fieri" lemma="fio1" postag="v--pnp---" head="3" relation="OBJ" />
        <word id="3" form="posse" lemma="possum1" postag="v--pna---" head="0" relation="ExD" />
        <word id="4" form="," lemma="comma1" postag="u--------" head="5" relation="AuxX" />
        <word id="5" form="si" lemma="si1" postag="c--------" head="2" relation="AuxC" />
        <word id="6" form="suas" lemma="suus1" postag="a-p---fa-" head="7" relation="ATR" />
        <word id="7" form="copias" lemma="copia1" postag="n-p---fa-" head="12" relation="OBJ" />
        <word id="8" form="Haedui" lemma="Aedui1" postag="n-p---mn-" head="12" relation="SBJ" />
        <word id="9" form="in" lemma="in1" postag="r--------" head="12" relation="AuxP" />
        <word id="10" form="fines" lemma="finis1" postag="n-p---ma-" head="9" relation="OBJ" />
        <word id="11" form="Bellovacorum" lemma="Bellovaci1" postag="n-p---mg-" head="10" relation="ATR" />
        <word id="12" form="introduxerint" lemma="introduco1" postag="v3ptia---" head="13" relation="ADV_CO" />
        <word id="13" form="et" lemma="et1" postag="c--------" head="5" relation="COORD" />
        <word id="14" form="eorum" lemma="is1" postag="p-p---mg-" head="15" relation="ATR" />
        <word id="15" form="agros" lemma="ager1" postag="n-p---ma-" head="16" relation="OBJ" />
        <word id="16" form="populari" lemma="populor1" postag="v--pnp---" head="17" relation="OBJ" />
        <word id="17" form="coeperint" lemma="coepio1" postag="v3ptia---" head="13" relation="ADV_CO" />
    </sentence>
    <sentence id="18" document_id="Perseus:text:1999.02.0002" subdoc="Book=2:chapter=5" span="His1:dimittit0">
        <word id="1" form="His" lemma="hic1" postag="p-p---nb-" head="3" relation="ATR" />
        <word id="2" form="datis" lemma="do1" postag="t-prppnb-" head="7" relation="ADV" />
        <word id="3" form="mandatis" lemma="mandatum1" postag="n-p---nb-" head="2" relation="SBJ" />
        <word id="4" form="eum" lemma="is1" postag="p-s---ma-" head="7" relation="OBJ" />
        <word id="5" form="a" lemma="ab1" postag="r--------" head="7" relation="AuxP" />
        <word id="6" form="se" lemma="sui1" postag="p-s---mb-" head="5" relation="OBJ" />
        <word id="7" form="dimittit" lemma="dimitto1" postag="v3spia---" head="0" relation="PRED" />
    </sentence>
    [...]
</treebank>

Un esempio dal corpus sloveno jos100k.

Večjega občutka sreče si ni mogoče predstavljati.'Gioia più grande non si può immaginare.'

Si noti che mentre nell'esempio precedente le parole di cui si compone il testo sono inserite come valore dell'attributo 'form' degli elementi 'word', in questo esempio le parole di cui si compone il testo sono inserite come testi contenuti negli elementi 'w'.

[...]
<s xml:id="F0002008.72.10" n="7 8">
    <w xml:id="F0002008.72.10.1" lemma="velik" msd="Pppmer">Večjega</w>
    <S/>
    <w xml:id="F0002008.72.10.2" lemma="občutek" msd="Somer">občutka</w>
    <S/>
    <w xml:id="F0002008.72.10.3" lemma="sreča" msd="Sozer">sreče</w>
    <S/>
    <w xml:id="F0002008.72.10.4" lemma="se" msd="Zp---d--k">si</w>
    <S/>
    <w xml:id="F0002008.72.10.5" lemma="biti" msd="Gp-ste-d">ni</w>
    <S/>
    <w xml:id="F0002008.72.10.6" lemma="mogoče" msd="Rsn">mogoče</w>
    <S/>
    <w xml:id="F0002008.72.10.7" lemma="predstavljati" msd="Ggnn">predstavljati</w>
    <c xml:id="F0002008.72.10.8">.</c>
    <S/>
</s>
[...]

Linguistica computazionale

Barra laterale

Contenuti

Indice

Struttura di un documento xml

Elementi e attributi

Esempi tratti da corpora annotati

Linguistica computazionale

Strumenti Utente

Strumenti Sito

Barra laterale

Contenuti

Indice

Struttura di un documento xml

Elementi e attributi

Esempi tratti da corpora annotati

Strumenti Pagina