README ¶
Examples of using ValuesFromTagPath(). A number of interesting examples have shown up in the gonuts discussion group that could be handled - after a fashion - using the ValuesFromTagPath() function. gonuts1.go - Here we see that the message stream has a problem with multiple tag spellings, though the message structure remains constant. In this example we 'anonymize' the tag with the variant spellings. mm := x2j.ValuesFromTagPath(doc,"data.*) where '*' is any possible spelling - "netid" or "idnet" and the result is a list with 1 member of map[string]interface{} type. Once we've retrieved the map, we can parse it using the known keys - "disable", "text1" and "word1". gonuts2.go - This is an interesting case where there was a need to handle messages with lists of "ClaimStatusCodeRecord" entries as well as messages with NONE. (Here we see some of the vagaries of dealing with mixed messages that are verging on becoming anonymous.) strXml - the message with two ClaimStatusCodeRecord entries strXmlA - the message with one ClaimStatusCodeRecord entry strXml2 - the message with NO ClaimStatusCodeRecord entries ValuesFromTagPath() options: path == "Envelope.Body.GetClaimStatusCodesResponse.GetClaimStatusCodesResult.ClaimStatusCodeRecord" for doc == strXml: returns: a list - []interface{} - with two values of map[string]interface{} type for doc == strXmlA: returns: a list - []interface{} - with one map[string]interface{} type for doc == strXml2: returns 'nil' - no values path == "*.*.*.*.*" for doc == strXml: returns: a list - []interface{} - with two values of map[string]interface{} type path == "*.*.*.*.*.Description for doc == strXml: returns: a list - []interface{} - with two values of string type, the individual values from parsing the two map[string]interface{} values where key=="Description" path == "*.*.*.*.*.*" for doc == strXml: returns: a list - []interface{} - with six values of string type, the individual values from parsing all keys in the two map[string]interface{} values IN GENERAL Think of the wildcard character "*" as anonymizing the tag in the position of the path where it occurs. If we have a message that looks like this. var doc = ` <books> <book seq="1"> <author>William H. Gaddis</author> <title>The Recognitions</title> <review>One of the great seminal American novels of the 20th century.</review> </book> <book seq="2"> <author>Austin Tappan Wright</author> <title>Islandia</title> <review>An example of earlier 20th century American utopian fiction.</review> </book> <book seq="3"> <author>John Hawkes</author> <title>The Beetle Leg</title> <review>A lyrical novel about the construction of Ft. Peck Dam in Montana.</review> </book> <book seq="4"> <author>T.E. Porter</author> <title>King's Day</title> <review>A magical novella.</review> </book> </books> ` path == "books" return a list with one members, map[string]interface{}, with a single key 'book' and whose value is a list. path == "books.book" return a list of four members, map[string]interface{} - each map[string]interface{} is a 'book' entry path == "books.*" return a list of four members, map[string]interface{} - each map[string]interface{} is a 'book' entry [the same as "books.book"] path == "books.*.title" return a list of four members, interface{} - interface{} is the title of a book in the list of 'books.book' [the same as "books.book.title"] path == "books.*.*" return a list of twelve members, interface{} - interface{} is the title, author, or review of a book in the list of 'books.book' [the same as "books.book.*"] NOTES. Attributes. By default interface{} values do not include attributes; though they do show up in maps. So, if you're looking for a particular values based on attribute values, retrieve the entries as map[string]interface{} values and parse them based on attribute key values. (Attribute keys are have '-' prepended to their name.) The gonuts3.go example illustrates extracting all attribute values in a doc associated with a specific tag value. Lists of entries vs. single entries. The gonuts2.go example highlights some of the ambiguity that must be managed when dealing with XML messages anonymously using the x2j package. - for strXml "ClaimStatusCodeRecord" has a []interface{} value - for strXmlA "ClaimStatusCodeRecord" has a map[string]interface{} value ValuesFromTagPath() tries to normalize this by returning the []interface{} value as separate list members rather than the absolute value in the map representation. So the same logic could be used to parse the entries no matter how many values are returned - other than 'nil'. "...ClaimStatusCodeRecord.*" will return all [non-attribute] values without their tags, so it's a little hard to know what you've got. It is recommended that the wildcard - "*" - not be used at the end of a path for that very reason. EAT YOUR OWN DOG FOOD ... I needed to convert a large (14.9 MB) XML data set from an Eclipse metrics report on an application that had 355,100 lines of code in 211 packages into CSV data sets. The report included application-, package-, class- and method-level metrics reported in an element, "Value", with varying attributes. <Value value=""/> <Value name="" package="" value=""/> <Value name="" source="" package="" value=""/> <Value name="" source="" package="" value="" inrange=""/> In addition, the metrics were reported with two different "Metric" compound elements: <Metrics> <Metric id="" description=""> <Values> <Value.../> ... </Values> </Metric> ... <Metric id="" description=""> <Value.../> </Metric> ... </Metrics> Using the x2j package seemed a more straightforward approach than using Go vernacular and the standard xml package. I wrote the program getmetrics.go to do this. Note that the call to ValuesFromKeyPath() metricVals := x2j.ValuesFromKeyPath(m, "Metrics.Metric", true) requests that attributes be returned by including 'true' as the third argument. After that everything is pretty straight forward. The metrics for processing over 120,000 "Value" elements (Intel Core i7, 8 GB, OS X 10.8, Solid State drive) are as follows. 2013-07-24 06:59:54.97486625 -0500 CDT ... File Opened: static_analysis.xml 2013-07-24 06:59:54.982837648 -0500 CDT ... File Read - size: 14863568 2013-07-24 06:59:58.656195538 -0500 CDT ... XML Unmarshaled - len: 1 2013-07-24 06:59:58.65622715 -0500 CDT ... ValuesFromKeyPath - len: 23 2013-07-24 06:59:58.657157356 -0500 CDT id: VG desc: McCabe Cyclomatic Complexity len(Values): 24562 2013-07-24 06:59:58.996143039 -0500 CDT id: PAR desc: Number of Parameters len(Values): 24562 2013-07-24 06:59:59.333407792 -0500 CDT id: NBD desc: Nested Block Depth len(Values): 24562 2013-07-24 06:59:59.672552235 -0500 CDT id: CA desc: Afferent Coupling len(Values): 211 2013-07-24 06:59:59.675736869 -0500 CDT id: CE desc: Efferent Coupling len(Values): 211 2013-07-24 06:59:59.678760375 -0500 CDT id: RMI desc: Instability len(Values): 211 2013-07-24 06:59:59.682004251 -0500 CDT id: RMA desc: Abstractness len(Values): 211 2013-07-24 06:59:59.684985741 -0500 CDT id: RMD desc: Normalized Distance len(Values): 211 2013-07-24 06:59:59.688191982 -0500 CDT id: DIT desc: Depth of Inheritance Tree len(Values): 2074 2013-07-24 06:59:59.717945046 -0500 CDT id: WMC desc: Weighted methods per Class len(Values): 2074 2013-07-24 06:59:59.747678025 -0500 CDT id: NSC desc: Number of Children len(Values): 2074 2013-07-24 06:59:59.778378946 -0500 CDT id: NORM desc: Number of Overridden Methods len(Values): 2074 2013-07-24 06:59:59.80913273 -0500 CDT id: LCOM desc: Lack of Cohesion of Methods len(Values): 2074 2013-07-24 06:59:59.838893489 -0500 CDT id: NOF desc: Number of Attributes len(Values): 2074 2013-07-24 06:59:59.868573348 -0500 CDT id: NSF desc: Number of Static Attributes len(Values): 2074 2013-07-24 06:59:59.89918479 -0500 CDT id: NOM desc: Number of Methods len(Values): 2074 2013-07-24 06:59:59.92889002 -0500 CDT id: NSM desc: Number of Static Methods len(Values): 2074 2013-07-24 06:59:59.959311932 -0500 CDT id: SIX desc: Specialization Index len(Values): 2074 2013-07-24 06:59:59.988859435 -0500 CDT id: NOC desc: Number of Classes len(Values): 211 2013-07-24 06:59:59.992076575 -0500 CDT id: NOI desc: Number of Interfaces len(Values): 211 2013-07-24 06:59:59.995418021 -0500 CDT id: NOP desc: Number of Packages len(Value): 1 2013-07-24 06:59:59.995971956 -0500 CDT id: TLOC desc: Total Lines of Code len(Value): 1 2013-07-24 06:59:59.996635113 -0500 CDT id: MLOC desc: Method Lines of Code len(Values): 24562
Documentation ¶
There is no documentation for this package.
Click to show internal directories.
Click to hide internal directories.