Rewrite almanach merge logic

[eccc_to_commons.git] / README
diff --git a/README b/README

index ca371dca55e57823b9fb61ca6608289c4649de8c..198387187540ad0681a28ba1da727ee6758e4aa6 100644 (file)
--- a/README
+++ b/README
@@ -12,8 +12,7 @@ distribution. In addition to coreutils, prerequisites are:
  - Xmlstarlet
  - Jq
  
  - Xmlstarlet
  - Jq
  
-This repository is sponsored by Environment and Climate change Canada and
-Wikimedia Canada.
+This repository is sponsored by Wikimedia Canada.
  
  
  Provided scripts, ordered by chronological usage:
  
  
  Provided scripts, ordered by chronological usage:
@@ -22,8 +21,11 @@ dllist.sh                 outputs a curl configuration file listing all availabl
  eccc_fixer.sh             fix upstream data XML files
  eccc_fixer.xslt           fix upstream data XML file
  commons_rules.xsd         validate ECCC XML from a Wikimedian point of view
  eccc_fixer.sh             fix upstream data XML files
  eccc_fixer.xslt           fix upstream data XML file
  commons_rules.xsd         validate ECCC XML from a Wikimedian point of view
+eccc_merger.sh            merge multiple ECCC XML files
  eccc_to_commons.sh        transform ECCC XML files into JSON
  monthly_to_commons.xslt   transform ECCC monthly XML file into JSON
  eccc_to_commons.sh        transform ECCC XML files into JSON
  monthly_to_commons.xslt   transform ECCC monthly XML file into JSON
+almanac_to_commons.xslt   transform ECCC almanac XML file into JSON
+mediawiki_post.sh         upload directory to a Mediawiki
  
  
  Usage:
  
  
  Usage:
@@ -71,8 +73,8 @@ Keep only monthly data:
      -E '^output = ".*/monthly/[A-Z0-9]{7}.xml"$' > downloads_monthly
  
  Remove all downloads before (restart interrupted download):
      -E '^output = ".*/monthly/[A-Z0-9]{7}.xml"$' > downloads_monthly
  
  Remove all downloads before (restart interrupted download):
-       $ sed -n '/https:\/\/climate.weather.gc.ca\/climate_data\/bulk_data_e.html?format=xml&timeframe=3&stationID=2606/,$p' \
-         downloads_all > download_continue
+  $ sed -n '/https:\/\/climate.weather.gc.ca\/climate_data\/bulk_data_e.html?format=xml&timeframe=3&stationID=2606/,$p' \
+    downloads_all > download_continue
  
  
  1.3 Download wanted files
  
  
  1.3 Download wanted files
@@ -129,11 +131,25 @@ Same as previously, the output should be empty. Otherwise, you must resolve
  every single problem before continuing.
  
  
  every single problem before continuing.
  
  
+[OPTIONAL STEP] Merge multiple XML files
+Sometimes, having per station granularity is too accurate. If you need to merge
+two or more XML files, you can use the eccc_merge.sh script:
+
+  $ ./eccc_merger.sh "${ECCC_CACHE}/almanac/3050519.xml" \
+    "${ECCC_CACHE}/almanac/3050520.xml" "${ECCC_CACHE}/almanac/3050521.xml" \
+    "${ECCC_CACHE}/almanac/3050522.xml" "${ECCC_CACHE}/almanac/3050526.xml" \
+    > banff.xml
+
+In order to get stations ids based on their geographical position, you can use
+the eccc_map tool. A public instance is hosted online at
+https://stations.wikimedia.ca/ .
+
+
  4. Transform data into target format
  Here we are, here is the fun part: let's create weather data in Wikimedia
  Commons format.
  
  4. Transform data into target format
  Here we are, here is the fun part: let's create weather data in Wikimedia
  Commons format.
  
-  $ ./eccc_to_commons "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log
+  $ ./eccc_to_commons.sh "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log
  
  It will replicate the future Commons content paths inside nested directories.
  So, for example future
  
  It will replicate the future Commons content paths inside nested directories.
  So, for example future
@@ -144,4 +160,11 @@ conversion.
  
  
  5. Upload to destination
  
  
  5. Upload to destination
-Not done yet.
+It's now time to share our work with the world and that's the purpose of the
+mediawiki_post.sh script.
+
+  $ ./mediawiki_post.sh "${COMMONS_CACHE}"
+
+It takes the commons cache as parameter: its file hierarchy will be replicated
+on commons. On first run, it will ask credentials for the Mediawiki account to use to
+perform the import.