
NAME	
	nxingest - 1.4
		
USAGE
	nxingest mapping_file nexus_file [output_file]
	
DESCRIPTION
	nxingest extract the metadata from a NeXus file to create an XML file 
	according to a mapping file. 
	
	The mapping file will defines the structure (names and hierarchy) and 
	content (from the NeXus file, from the mapping file or from the current time)
	of the oputput file. See below for a description of the maping file.
	
	This tool use the NeXus api so any of the supported format (HDF4, HDF5 
	and XML) can be read. 
	
	The output file parameter is optional. if not present nxingest will write 
	the results into output.xml	
	
	To be accepted by ICAT, the output XML should match the ICAT3 XML schema.
	See http://nile.dl.ac.uk/trac/isis/browser/Software/icat_xsd/icatXSD.xsd 
	NB: This work is ongoing; the schema hasn't been released yet.   
	 
	This tool use the NeXus api so any of the supported format (HDF4, HDF5 
	and XML) can be read. 
	
	nxingest can retrive metadata from the neXus file, from the mapping file 
	(fix parameter). it can also retrive the actual time. 
	
	If the metadata is a date or time, nxingest may modify it to extract only 
	part of the data (e.g. the year) or reformat it to have consistant dates 
	into ICAT.      
	
	NeXus syntax.
	-------------
	NeXus Data is divided in different classes that hold data sets. The data 
	sets may hold any type of data from a single byte to unlimited dimension 
	arrays. The data sets and the classes may also have attributes.
	
	To collect data from a neXus file, you have to build the path to the data 
	you want.
	
	- Simple dataset (singular string or number)
	The path is the name of the different classes separated by '/'the last name 
	is the name of the dataset.
	e.g. /run/title
	
	- Attribute (singular string or number)
	The attribute name is separated from the dataset by a '.'
	e.g. /run/data.units
	
	- Arrays
	Most of the data will be stored as multi dimensional arrays. We may want to 
	extract particular information from the data. 
		- Specific value from an array
		A null or positive number between square brackets after the data set 
		name. nxingest consider all dataset an uni-dimension.  
		e.g. /run/data_array[3] 
		
		- Derived value
		nxingest may derived a few value from an array. To express that, you 
		have to put the name of the derived parameter between square brackets. 
		Available values are : 
			[AVG] Average
			[STD] Standard Deviation
			[MIN] Minimum Value
			[MAX] Maximum Value
			[SUM] Sum of all values
		e.g. /run/sample/temperature_log/value[AVG]	
		
	- Generic classes
	NeXus defined generic classes type that user can name freely. nxingest can 
	use some of these to generalise the mapping files for similar instrument. 
	By Writing the class type under rounded brackets like {NXentry} the program 
	will substitue it with the actual class name from the current file. 
	This is currenlty only available for {NXentry}, {NXinstrument} and {NXuser}
	e.g. /{NXentry}/{NXinstrument}/source/name
	is equivalent to  
		/run/MUSR/source/name
		/entry_0/I18/source/name
	Also there may be more than 1 user define in a NeXus file. nxingest will 
	loop over each of them if the mapping include a special node 'user_tbl'.   

	Mapping File
	------------
	The mapping file is an xml document that will control the execution of 
	nxingest, defining the structure and content of the output of nxingest.  
	
	nxingest will scan the mapping file analysing all the element nodes it find.  
	
	There are 3 major types of node : 
	
	1) Table node that define the hierarchy of the output document.
		e.g. the mapping : 	<icat type="tbl">
								<study type="tbl">
									<investigation type="tbl" trusted="false">
		is mapped into : 	<icat>
								<study>
									<investigation  trusted="false">		
	1') User Table node is a specific case where the node is scan several time 
	according to the number of {NXuser} type classes are present in the neXus 
	file. at each iteration, nxingest will replace the string {NXuser} by the 
	correct name found in the file.								
	e.g. the mapping :		<investigation type="tbl">
								<user type="user_tbl"> ... </user>
		is mapped into : 	<investigation>
								<user> ... </user>
								<user> ... </user>
								<user> ... </user>
								
	2) 'Tag' node which define a simple metadata record. It has 2 child node 
		that contain the name of the output element and 
		e.g. the mapping : 	<record>
								<icat_name>name</icat_name>
								<value type="nexus"> path_to_metadata </value>
							</record>
		is mapped into : 	<name> metadata_from_nexus_file </name
		
	3) Parameter node define one row of one of the ICAT PARAMETER table. 
	   The different column names are defined so we only have to extract the 
	   necessary metadata.	
	   It should be possible to perform the same task with only the node type 
	   1 and 2 but this simplify the presentation. After all, the aim is to 
	   create an document that can be ingested into ICAT3 and this 
		e.g. the mapping : 	<parameter type="param_str">
								<icat_name> hdf5_version </icat_name>
								<value type="nexus"> /.HDF5_Version </value>
								<description type="fix"> HDF5 Version used in \
								             creating the file. </description>
							</parameter>
		is mapped into : 	<parameter>
								<name> hdf5_version </name>
								<string_value> 3.5 </string_value>
								<description> HDF5 Version used in creating the\
								              file. </description>
							</parameter>
		

	Metadata Sources
	----------------
	the source of the metadata is defined by nodes of type 'fix', 'nexus', 
	'special' and 'mix'. if the type is special. the begining of the text will 
	contain a modifier (fix:, nexus:, time:or sys: ) The value is then the text 
	without the modifier. 
	
	1) Fix string from the mapping file itself. 
		- Node type fix		
		- Node type : special; modifier fix:
		
	2) From the NeXus file.
		- Node type nexus		
		- Node type : special; modifier nexus:	
		- Node type : special; modifier time:nexus(...)	 See below.

	3) Time	
		- Node type : special; modifier time:	
		Time can be expressed in multiple format, so the the value after the 
		modifier will be composed in 3 parts : 
			time:source ; input_format; output format
		
		- The source can be 'now' for the current time or 'nexus()' with the path 
		  to the time string between the parenthesis.
		- input and output format are optional. The s/w expect an integer. 
		  Currently the possible values are 
				0		'2007-05-23T12:48:05'	(default)
				1		'2007-05-23 12:48:05'	
				2		'2007-05-23'
				3		'12:48:05'
				4		'20070523'
				5		'200705'
				6		'2007'
				7		'23/05/2007'
	4) System
		- sys:filename gives the filename of the NeXus file. 
		- sys:location gives the path of the NeXus file.
		- sys:size gives the size in bytes of the NeXus file.
	
	5) Mix of all 3. 
		- Node type : mix; 
		To combine several sources, several modifiers used with node type 
		'special' are used separated with '|'.
		e.g. <value type="mix"> nexus:/{NXentry}/{NXinstrument}.short_name | 
		                        fix:_ | time:now ; 0 ; 5 </value>
		should create something like : 'MUSR_2007' 
		
		
	
EXAMPLE	
	
	Here is an example of the minimal xml document that can be accepted by ICAT.
	
		<?xml version="1.0" encoding="ISO-8859-1"?>
		<icat version="1.0 RC6">
			<study>
				<investigation>
					<inv_number>000001</inv_number>
					<visit_id>01</visit_id>
					<instrument>MUSR</instrument>
					<investigator>
						<user_id>ll56</user_id>
						<role>tester, developer</role>
					</investigator>
					<dataset>
						<name>test-1</name>
						<dataset_type>EXPERIMENT_RAW</dataset_type>
						<datafile>
							<name>file_01</name>
							<location>/test/inv_1/file_01.jpg</location>
						</datafile>
					</dataset>
				</investigation>
			</study>
		</icat>
	
	And the mapping file to create such document. 
	
		<?xml version="1.0" encoding="ISO-8859-1"?>
		<icat type="tbl">
			<study type="tbl">
				<investigation type="tbl">
					<record>
						<icat_name>inv_number</icat_name>
						<value type="nexus"> path_to_metadata </value>
					</record>
					<record>
						<icat_name>visit_id</icat_name>
						<value type="fix"> 1 </value>
					</record>
					<record>
						<icat_name>instrument</icat_name>
						<value type="nexus"> path_to_metadata </value>
					</record>
					<investigator type="tbl">
						<record>
							<icat_name>user_id</icat_name>
							<value type="nexus"> path_to_metadata </value>
						</record>
						<record>
							<icat_name>role</icat_name>
							<value type="nexus"> path_to_metadata </value>
						</record>
					</investigator>
					<dataset type="tbl">
						<record>
							<icat_name>name</icat_name>
							<value type="nexus"> path_to_metadata </value>
						</record>
						<record>
							<icat_name>dataset_type</icat_name>
							<value type="fix"> EXPERIMENT_RAW </value>
						</record>					
						<datafile type="tbl">
						<record>
							<icat_name>name</icat_name>
							<value type="nexus"> path_to_metadata </value>
						</record>
						<record>
							<icat_name>location</icat_name>
							<value type="nexus"> path_to_metadata </value>
						</record>
						</datafile>
					</dataset>
				</investigation>
			</study>
		</icat>
	
	
KNOWN ISSUES
	No test of Ingestion into ICAT has been done so far. 
	
	Error handling and logging should be improved. 
	
	The list of time format is relatively limited, it should be expanded. 
	
	Only the columns NAME, UNITS, STRING_VALUE or NUMERIC_VALUE and DESCRIPTION
	from the parameter table can be filled at the moment. the column RANGE_TOP, 
	RANGE_BOTTOM and ERROR are currently ignored.  	

HISTORY
		Version 1.0		05/06/2007		First version. 

		Version 1.1		13/06/2007		Remove trailing white space around string. 
		
		Version 1.2		16/08/2007		
			Bug correction: if the string is too short for the expected transformation, 
			an out of bound exception would have been raised by substr  
					
		Version 1.3		31/08/2007		
			Bug correction in test. (Logical OR and not bitwise OR)
			Add a filesize calculation function. 

		Version 1.4		04/09/2007	
			The parameters will not be created if there is no value attached to them. 
					
DEPENDANCIES

	- NeXux Library. 
	- mxml library version 2.2.2 
		You may have to change the MXML_WRAP constant in mxml.h to a much 
		higher value to avoid carriage return to appear in the xml document. 

AUTHOR
	Laurent Lerusse
		Science and Technology Facility Council
		e-Science Center - Data Management Group
		e-mail : l.lerusse@stfc.ac.uk

COPYRIGHT
    nxingest 1.4 - extraction of metadata from NeXus files into a xml document
    Copyright (C) 2007  STFC e-Science Center

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; version 2 of the License.
	
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.