- Xmlstarlet
- Jq
-This repository is sponsored by Environment and Climate change Canada and
-Wikimedia Canada.
+This repository is sponsored by Wikimedia Canada.
Provided scripts, ordered by chronological usage:
eccc_fixer.sh fix upstream data XML files
eccc_fixer.xslt fix upstream data XML file
commons_rules.xsd validate ECCC XML from a Wikimedian point of view
+eccc_merger.sh merge multiple ECCC XML files
eccc_to_commons.sh transform ECCC XML files into JSON
monthly_to_commons.xslt transform ECCC monthly XML file into JSON
almanac_to_commons.xslt transform ECCC almanac XML file into JSON
+mediawiki_post.sh upload directory to a Mediawiki
Usage:
-E '^output = ".*/monthly/[A-Z0-9]{7}.xml"$' > downloads_monthly
Remove all downloads before (restart interrupted download):
- $ sed -n '/https:\/\/climate.weather.gc.ca\/climate_data\/bulk_data_e.html?format=xml&timeframe=3&stationID=2606/,$p' \
- downloads_all > download_continue
+ $ sed -n '/https:\/\/climate.weather.gc.ca\/climate_data\/bulk_data_e.html?format=xml&timeframe=3&stationID=2606/,$p' \
+ downloads_all > download_continue
1.3 Download wanted files
every single problem before continuing.
+[OPTIONAL STEP] Merge multiple XML files
+Sometimes, having per station granularity is too accurate. If you need to merge
+two or more XML files, you can use the eccc_merge.sh script:
+
+ $ ./eccc_merger.sh "${ECCC_CACHE}/almanac/3050519.xml" \
+ "${ECCC_CACHE}/almanac/3050520.xml" "${ECCC_CACHE}/almanac/3050521.xml" \
+ "${ECCC_CACHE}/almanac/3050522.xml" "${ECCC_CACHE}/almanac/3050526.xml" \
+ > banff.xml
+
+In order to get stations ids based on their geographical position, you can use
+the eccc_map tool. A public instance is hosted online at
+https://stations.wikimedia.ca/ .
+
+
4. Transform data into target format
Here we are, here is the fun part: let's create weather data in Wikimedia
Commons format.
- $ ./eccc_to_commons "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log
+ $ ./eccc_to_commons.sh "${ECCC_CACHE}" "${COMMONS_CACHE}" 2>log
It will replicate the future Commons content paths inside nested directories.
So, for example future
5. Upload to destination
-Not done yet.
+It's now time to share our work with the world and that's the purpose of the
+mediawiki_post.sh script.
+
+ $ ./mediawiki_post.sh "${COMMONS_CACHE}"
+
+It takes the commons cache as parameter: its file hierarchy will be replicated
+on commons. On first run, it will ask credentials for the Mediawiki account to use to
+perform the import.