MIME-Version: 1.0 Content-Location: file:///C:/593BB24E/xmlFileDescription.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii" 4 February 2003

11 September 2003

MetNet DB xml file= s and formats description

 

introduction<= /o:p>

The MetNet DB file consists of a group of two or more xml-format text files zip= ped into a single archive.  A sing= le compressed archive is used to save disk space and make file interaction eas= ier for the user by reducing the number of files that must be opened.

 

All MetNet DB zipped archives must contain one contents.xml file and one topology.xml file.  The archiv= e may also contain an index.xml file, and/or pathways.xml file.  These files are optional but may o= nly occur once each if present.

 

All MetNet DB xml files should be saved as plain ascii text with the name of the file type as the name of the file (for example, all topology files should be called topology.xml).

 

 

topology.xml file = format

The topology file lists the nodes and edges of the graph.  All of the other files contain additional details about the biological significance of each node; the topo= logy file contains all of the information required to draw the graph.=

 

The topology file is made up of node and edge listings, the description of which follows.  The order in which t= he nodes and edges are listed does not matter.

 

· each node entry co= nsists of:

· attribute id - string (unique ID) required

· attribute <= span style=3D'font-family:"Courier New"'>molID - string (molecule ID) required

· attribute <= span style=3D'font-family:"Courier New"'>nodeName – string (default node name) required

· element nodeType - string (type of molecule represented by the = node) 1 required

· element location - string (cellular location of the node) 1 required

· each edge entry co= nsists of:

· attribute <= span style=3D'font-family:"Courier New"'>id - string (unique ID) required

· attribute <= span style=3D'font-family:"Courier New"'>tail - string (unique ID of node edge is from) requ= ired

· attribute <= span style=3D'font-family:"Courier New"'>head - string (unique ID of node edge points to) required

· attribute <= span style=3D'font-family:"Courier New"'>directed – boolean (if edge is directed) required=

· attribute <= span style=3D'font-family:"Courier New"'>strength - decimal (reaction strength represented) opti= onal

· element edgeType - string (type of reaction represented by the = edge) zero or one=

· element certainty - string (certainty that the edge is correct) zero or one=

 

Every node has two ID codes, a unique ID = and a molecule ID.  The unique ID is the internal name= for the node, and can contain both letters and numbers.  There must not be any repeated uni= que IDs in the entire file.  The molecule ID identifies the molecule that the node represents.  Many nodes may share the same mole= cule ID.  For example, the compound glucose is present in many reactions and many places in a cell.  As a result, there are multiple gl= ucose nodes, each of which will have a different unique ID but the same molecule ID.  All reaction nodes have t= he same molecule ID (generally “reaction”).  The molecule ID can be made of let= ters or numbers or both.

 

 

contents.xml file = format

contents.xml is the shortest and simplest xml file of the data set.  The contents file contains backgro= und information about the data included in the zipped archive and boolean value= s to indicate if the optional files (pathways, index, or extended) are present in the archive.

 <= /span>

contents.xml consi= sts of a single fileSet entry made up of:

· attribute topologyPresent – boolean (if file is present) required

· attribute indexPresentboolean (if file is present) required

· attribute pathwaysPresentboolean (if file is present) required

· attribute extendedPresentboolean (if file is present) required

· element createdBy – string (person(s) that created the archive) zero or more

· element dateCreated – date (date archive created) zero or = one

· element projectName – string (name of project that created the archive) zero or one

· element institution – string (institutions(s) that created = the archive) zero or more

· element description – string (project/data description; may= be multiple paragraphs) zero or one

· element organism – string (organism(s) the data comes from) zero or more=

· element dataSource – string (source(s) of the data [MetNet database, etc.]) zero or more

 

 

index.xml file for= mat

index.xml contains information on alternative names for the molecules represented by = the nodes in the graph and summarizes some of the information from topology.xml.  The molecule ID from e= very node in the topology file (except reaction nodes) will have a single= entry in index.x= ml.

 

Nested within each molecule ID’s entry will be a listing of synonyms and abbreviations for the molecule (other than the default nodeName) and a list of all the locations in which the molecule occurs in the graph.

 

index.xml consists of molecule listings, one fo= r each unique molecule ID (except reaction nodes) in the associated topology.xml.<= span style=3D'mso-spacerun:yes'>  Each molecule entry is composed of:

· attribute molID – string (molecule ID of the molecule) required

· attribute nodeName – string (default node name, same as in topology.xml) requi= red

· element abbrev - string (abbreviation for the molecule) zero or more<= /span>

· element synonym - string (synonyms of the molecule’s name) zero or more

· element location - string (all locations where the molecule is found) zero or more=

 

 

pathways.xml file = format

pathways.xml conta= ins information on the named biologically-significant pathways present in the d= ata set.  Each pathway is made of = nodes and edges from topology.xml, all of which are listed as = member entries.&nb= sp; Multiple names (synonyms) may be included for each pathway in this file; all other informa= tion about the pathway (references, etc.) is contained in extended.xml.

 <= /span>

pathways.xml consi= sts of pathway listings, each of which are composed of:

· attribute id - str= ing (pathway unique ID) required

· attribute <= span style=3D'font-family:"Courier New";mso-bidi-font-family:"Times New Roman"'>= pathwayName – string (d= efault name for the pathway) required

· element synonym -  string (other nam= e(s) for the pathway) zero or more

· element member - string (unique ID of a node or edge in the pathway) one or more=