X-Git-Url: https://git.wikimedia.ca/?p=eccc_to_commons.git;a=blobdiff_plain;f=README;h=198387187540ad0681a28ba1da727ee6758e4aa6;hp=9cf1c30a83e588faad07f6019815b565e2823436;hb=HEAD;hpb=842e57ffc32aeaf2818e04ace3f2568009653606 diff --git a/README b/README index 9cf1c30..1983871 100644 --- a/README +++ b/README @@ -21,6 +21,7 @@ dllist.sh outputs a curl configuration file listing all availabl eccc_fixer.sh fix upstream data XML files eccc_fixer.xslt fix upstream data XML file commons_rules.xsd validate ECCC XML from a Wikimedian point of view +eccc_merger.sh merge multiple ECCC XML files eccc_to_commons.sh transform ECCC XML files into JSON monthly_to_commons.xslt transform ECCC monthly XML file into JSON almanac_to_commons.xslt transform ECCC almanac XML file into JSON @@ -130,11 +131,25 @@ Same as previously, the output should be empty. Otherwise, you must resolve every single problem before continuing. +[OPTIONAL STEP] Merge multiple XML files +Sometimes, having per station granularity is too accurate. If you need to merge +two or more XML files, you can use the eccc_merge.sh script: + + $ ./eccc_merger.sh "${ECCC_CACHE}/almanac/3050519.xml" \ + "${ECCC_CACHE}/almanac/3050520.xml" "${ECCC_CACHE}/almanac/3050521.xml" \ + "${ECCC_CACHE}/almanac/3050522.xml" "${ECCC_CACHE}/almanac/3050526.xml" \ + > banff.xml + +In order to get stations ids based on their geographical position, you can use +the eccc_map tool. A public instance is hosted online at +https://stations.wikimedia.ca/ . + + 4. Transform data into target format Here we are, here is the fun part: let's create weather data in Wikimedia Commons format. - $ ./eccc_to_commons "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log + $ ./eccc_to_commons.sh "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log It will replicate the future Commons content paths inside nested directories. So, for example future