Initial commit for Part 1/2 scipts

Create readme file with scripts description Fix comments / fix readme Start Part2 Scripts for IP simulations Upload data Note about prepare Print variables Fix small discrepancy Completed data preprocessing script commenting Small fix titles Small fix figure numbers and clear import Add comments Use new Vostok data. Add amplitudes on figures. Update diu.-seas.var. figure Use new source of T2 data Fix 25 hour in T2
1 year ago · 4d9483bd11
36 changed files with 407796 additions and 0 deletions
--- a/0_prepare_data.ipynb
+++ b/0_prepare_data.ipynb
@ -0,0 +1,471 @@
				@@ -0,0 +1,471 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "98e6e23d-5ca7-4706-b1d9-dd57b54888ef",
+   "metadata": {},
+   "source": [
+    "# Data preprocessing for further calculations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5324ceb9-24e7-454b-87b9-ba9a717078ae",
+   "metadata": {},
+   "source": [
+    "### Import libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7b2a7f44-b0cb-4471-a0c6-e56da23caf86",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime as dt\n",
+    "\n",
+    "import numpy as np\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b2b903f-fa30-4e35-97e1-74bc0ee6b944",
+   "metadata": {},
+   "source": [
+    "### Helper variables"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "36b9f49e-32e6-4544-a9d3-f6a8ba49d867",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# also available at https://eee.ipfran.ru/files/seasonal-variation-2024/\n",
+    "# attention: the files are very large (~ 350 GB totally)\n",
+    "src_path = \"../shared_files/eee_public_files/seasonal-variation-2024/\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "78a4350c-59fb-479a-b7cd-e2bf9b996d36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# used numbers of simulated days for analysis\n",
+    "wrf_N_days = 4992\n",
+    "inmcm_N_days = 3650"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "53cb9cc3-0e56-4da4-920b-2f071a0846fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# dates corresponding to the indices (0 axis) of the data arrays\n",
+    "# note: for WRF dates correspond to real dates\n",
+    "\n",
+    "wrf_dt_indicies = np.array(\n",
+    "    [dt.date(1980, 1, 1) + dt.timedelta(i * 3) for i in range(wrf_N_days)]\n",
+    ")\n",
+    "inmcm_dt_indicies = np.array(\n",
+    "    [dt.date(2022, 1, 1) + dt.timedelta(i % 365) for i in range(inmcm_N_days)]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e16ee8e-f3b0-4251-9691-19d7dfd4aff7",
+   "metadata": {},
+   "source": [
+    "### Preprocessing WRF T2m data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "87860fa8-0a9c-4304-9c3c-94561c3e966c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# air temperature at the height of 2 m with the shape\n",
+    "# (number of days, number of hours, number of latitudes, number of longitudes)\n",
+    "# contains temperature values depending on (d, h, lat, lon)\n",
+    "#   d (axis 0) is the number of a day starting with 0 and ending with 5113\n",
+    "#              every third day is taken\n",
+    "#              d = 0 corresponds to 1 Jan 1980, \n",
+    "#              d = 5113 corresponds to 30 Dec 2021\n",
+    "#              d = 4991 corresponds to 29 Dec 2020\n",
+    "#              (we will restrict our attention to 1980–2020)\n",
+    "#   h (axis 1) is the hour of the day (an integer in [0, 25])\n",
+    "#              the values corresponding to h = 0 and h = 24 are the same\n",
+    "# lat (axis 2) describes the latitude (an integer in [0, 179]) \n",
+    "# lon (axis 3) describes the longitude (an integer in [0, 359])\n",
+    "\n",
+    "wrf_T2_data = np.load(f\"{src_path}/WRF-T2-MAP.npy\")[:wrf_N_days, :24]\n",
+    "wrf_T2_data_DAYxLAT = wrf_T2_data.mean(axis=(1, 3))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "1124d9f9-95d9-4c02-8176-82b9c0331d34",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(4992, 180)"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "wrf_T2_data_DAYxLAT.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "ec569ffd-93c2-4490-8ba1-69af4fab8f23",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# air temperature averaged over latitudes and months\n",
+    "wrf_mon_T2 = np.zeros((180, 12))\n",
+    "\n",
+    "for month_idx in range(12):\n",
+    "    # filter indicies by month number\n",
+    "    monthly_indicies = [\n",
+    "        i for i, date in enumerate(wrf_dt_indicies) if date.month == month_idx + 1\n",
+    "    ]\n",
+    "\n",
+    "    # putting values at specific month into averaged array\n",
+    "    wrf_mon_T2[:, month_idx] = wrf_T2_data_DAYxLAT[monthly_indicies].mean(axis=0)-273.15\n",
+    "\n",
+    "np.save(f\"./data/WRF/WRF_T2_LATxMON.npy\",wrf_mon_T2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "b480c05f-4b06-4d33-9527-dbe2655ed251",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "27.894258059212177"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "wrf_mon_T2.max()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46d4f093-a420-42c7-b885-a8409d9d8ee4",
+   "metadata": {},
+   "source": [
+    "### Preprocessing INMCM and WRF IP: classic parametrization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94a603c3-982d-4c78-be1c-bb6c74b86b5b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# dictionaries where processed data is saved\n",
+    "# the dictionary keys represent the threshold value of CAPE\n",
+    "\n",
+    "# for storing arrays with averaged by hours and summarized by longitude,\n",
+    "# i.e. with dimensions (4992, 180) for WRF and (3650, 120) for INMCM\n",
+    "wrf_daily_latitudal_ip = {}\n",
+    "inmcm_daily_latitudal_ip = {}\n",
+    "\n",
+    "# for storing arrays summarized by longitude and latitude,\n",
+    "# i.e. with dimensions (4992, 24) for WRF and (3650, 24) for INMCM\n",
+    "wrf_hourly_total_ip = {}\n",
+    "inmcm_hourly_total_ip = {}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d8e43c4f-59af-483c-8979-535c696abb4e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# iterating over the CAPE threshold (J/kg) values used in modeling \n",
+    "# for each threshold, there are corresponding model datasets\n",
+    "for cape_thres in [800, 1000, 1200]:\n",
+    "\n",
+    "    # grid cell contributions to the IP (not normalised) with the shape\n",
+    "    # (number of days, number of hours, number of latitudes, number of longitudes)\n",
+    "    wrf_raw_ip_data = np.load(f\"{src_path}/WRF-IP-MAP-{cape_thres}.npy\")[:wrf_N_days]\n",
+    "    # modelled using WRF model with CAPE threshold = `cape_thres` J/kg\n",
+    "    # contains values of contributions to the IP depending on (d, h, lat, lon)\n",
+    "    #   d (axis 0) is the number of a day starting with 0 and ending with 5113\n",
+    "    #              every third day is taken\n",
+    "    #              d = 0 corresponds to 1 Jan 1980, \n",
+    "    #              d = 5113 corresponds to 30 Dec 2021\n",
+    "    #              d = 4991 corresponds to 29 Dec 2020\n",
+    "    #              (we will restrict our attention to 1980–2020)\n",
+    "    #   h (axis 1) is the hour of the day (an integer in [0, 24])\n",
+    "    #              the values corresponding to h = 0 and h = 24 are the same\n",
+    "    # lat (axis 2) describes the latitude (an integer in [0, 179]) \n",
+    "    # lon (axis 3) describes the longitude (an integer in [0, 359])\n",
+    "\n",
+    "    # discarding the last hour, which duplicates the first one\n",
+    "    wrf_raw_ip_data = wrf_raw_ip_data[:, :24, :, :]\n",
+    "    \n",
+    "    # normalisation of contributions to the IP to the global mean of 240 kV\n",
+    "    wrf_raw_ip_data /= (1/240e3) * wrf_raw_ip_data.sum(axis=(-2,-1)).mean()\n",
+    "\n",
+    "    # filling dictionaries with averaged arrays\n",
+    "    wrf_daily_latitudal_ip[cape_thres] = wrf_raw_ip_data.mean(axis=1).sum(axis=-1)\n",
+    "    wrf_hourly_total_ip[cape_thres] = wrf_raw_ip_data.sum(axis=(-2, -1))\n",
+    "\n",
+    "    np.save(f\"./data/WRF/WRF_HOURLY_TOTAL_IP_{cape_thres}.npy\",\n",
+    "            wrf_hourly_total_ip[cape_thres])\n",
+    "\n",
+    "    # grid cell contributions to the IP (not normalised) reshaped to\n",
+    "    # (number of days, number of hours, number of latitudes, number of longitudes)\n",
+    "    inmcm_raw_ip_data = np.load(f\"{src_path}/INMCM-IP-MAP-{cape_thres}.npy\")\\\n",
+    "                          .reshape((inmcm_N_days, 24, 120, 180))\n",
+    "    # modelled using INMCM model with CAPE threshold = `cape_thres` J/kg\n",
+    "    # contains values of contributions to the IP depending on (d, h, lat, lon)\n",
+    "    #   d (axis 0) is the number of a day (not correspond to real days,\n",
+    "    #              10 consecutive 365-day years have been simulated)\n",
+    "    #   h (axis 1) is the hour of the day (an integer in [0, 23])\n",
+    "    # lat (axis 2) describes the latitude (an integer in [0, 179]) \n",
+    "    # lon (axis 3) describes the longitude (an integer in [0, 359])\n",
+    "\n",
+    "    # normalisation of contributions to the IP to the global mean of 240 kV\n",
+    "    inmcm_raw_ip_data /= (1/240e3) * inmcm_raw_ip_data.sum(axis=(-2,-1)).mean()\n",
+    "\n",
+    "    # filling dictionaries with averaged arrays\n",
+    "    inmcm_daily_latitudal_ip[cape_thres] = inmcm_raw_ip_data.mean(axis=1).sum(axis=-1)\n",
+    "    inmcm_hourly_total_ip[cape_thres] = inmcm_raw_ip_data.sum(axis=(-2, -1))\n",
+    "\n",
+    "    np.save(f\"./data/INMCM/INMCM_HOURLY_TOTAL_IP_{cape_thres}.npy\",\n",
+    "            inmcm_hourly_total_ip[cape_thres])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eb28cbc7-eb0a-49be-8cc1-734bba1d06f5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# iterating over the CAPE threshold (J/kg) values used in modeling \n",
+    "# for each threshold, there are corresponding model datasets\n",
+    "for cape_thres in [800, 1000, 1200]:\n",
+    "\n",
+    "    # initialization of an arrays to store time-averaged data over months\n",
+    "    wrf_data_LATxMON = np.zeros((180, 12))\n",
+    "    inmcm_data_LATxMON = np.zeros((120, 12))\n",
+    "\n",
+    "    # iteration over month number (starting with 0)\n",
+    "    for month_idx in range(12):\n",
+    "\n",
+    "        # filtering day indices belonging to a specific month\n",
+    "        wrf_monthly_indicies = [i for i, date in enumerate(wrf_dt_indicies) \n",
+    "                                if date.month == month_idx + 1]\n",
+    "        inm_monthly_indicies = [i for i, date in enumerate(inmcm_dt_indicies) \n",
+    "                                if date.month == month_idx + 1]\n",
+    "\n",
+    "        # filling with modeling values with a CAPE threshold \n",
+    "        # averaged over months of the year\n",
+    "        wrf_data_MONxLAT[:, month_idx] = \\\n",
+    "            wrf_daily_latitudal_ip[cape_thres][monthly_indicies].mean(axis=0)\n",
+    "        inmcm_data_LATxMON[:, month_idx] = \\\n",
+    "            inmcm_daily_latitudal_ip[cape_thres][monthly_indicies].mean(axis=0)\n",
+    "\n",
+    "    np.save(f\"./data/WRF/WRF_IP_{cape_thres}_LATxMON.npy\",\n",
+    "            wrf_data_MONxLAT)\n",
+    "    np.save(f\"./data/INMCM/INMCM_IP_{cape_thres}_LATxMON.npy\",\n",
+    "            inmcm_data_LATxMON)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91bc6d7a-393c-4078-9a6d-1955393d55f5",
+   "metadata": {},
+   "source": [
+    "### Preprocessing WRF IP: new parametrization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b6f987e-ba3c-4371-af7b-c9857a7d33d9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# grid cell contributions to the IP (not normalised) reshaped to\n",
+    "# (number of days, number of hours, number of latitudes, number of longitudes)\n",
+    "wrf_raw_ip_data = np.load(f\"{src_path}/WRF-IP-MAP-500-T2-25.npy\")[:wrf_N_days]\n",
+    "# modelled using WRF model using new parametrization based on\n",
+    "# CAPE and T2 with corresponding thresholds 500 J/kg and 25°C.\n",
+    "# contains values of contributions to the IP depending on (d, h, lat, lon)\n",
+    "#   d (axis 0) is the number of a day starting with 0 and ending with 5113\n",
+    "#              every third day is taken\n",
+    "#              d = 0 corresponds to 1 Jan 1980, \n",
+    "#              d = 5113 corresponds to 30 Dec 2021\n",
+    "#              d = 4991 corresponds to 29 Dec 2020\n",
+    "#              (we will restrict our attention to 1980–2020)\n",
+    "#   h (axis 1) is the hour of the day (an integer in [0, 24])\n",
+    "#              the values corresponding to h = 0 and h = 24 are the same\n",
+    "# lat (axis 2) describes the latitude (an integer in [0, 179]) \n",
+    "# lon (axis 3) describes the longitude (an integer in [0, 359])\n",
+    "\n",
+    "# discarding the last hour, which duplicates the first one\n",
+    "wrf_raw_ip_data = wrf_raw_ip_data[:, :24, :, :]\n",
+    "\n",
+    "# normalisation of contributions to the IP to the global mean of 240 kV\n",
+    "wrf_raw_ip_data /= (1/240e3) * wrf_raw_ip_data.sum(axis=(-2,-1)).mean()\n",
+    "\n",
+    "# filling dictionaries with averaged arrays\n",
+    "wrf_daily_latitudal_ip = wrf_raw_ip_data.mean(axis=1).sum(axis=-1)\n",
+    "wrf_hourly_total_ip = wrf_raw_ip_data.sum(axis=(-2, -1))\n",
+    "\n",
+    "np.save(\n",
+    "    f\"./data/WRF/WRF_HOURLY_TOTAL_IP_500_T2_25.npy\",\n",
+    "    wrf_hourly_total_ip,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5797fffa-a795-4241-a574-6c95f0195a5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# iterating over the CAPE threshold (J/kg) values used in modeling \n",
+    "# for each threshold, there are corresponding model datasets\n",
+    "for cape_thres in [800, 1000, 1200]:\n",
+    "\n",
+    "    # initialization of an arrays to store time-averaged data over months\n",
+    "    wrf_data_LATxMON = np.zeros((180, 12))\n",
+    "    inmcm_data_LATxMON = np.zeros((120, 12))\n",
+    "\n",
+    "    # iteration over month number (starting with 0)\n",
+    "    for month_idx in range(12):\n",
+    "\n",
+    "        # filtering day indices belonging to a specific month\n",
+    "        wrf_monthly_indicies = [i for i, date in enumerate(wrf_dt_indicies) \n",
+    "                                if date.month == month_idx + 1]\n",
+    "        inm_monthly_indicies = [i for i, date in enumerate(inmcm_dt_indicies) \n",
+    "                                if date.month == month_idx + 1]\n",
+    "\n",
+    "        # filling with modeling values with a CAPE threshold \n",
+    "        # averaged over months of the year\n",
+    "        wrf_data_MONxLAT[:, month_idx] = \\\n",
+    "            wrf_daily_latitudal_ip[cape_thres][monthly_indicies].mean(axis=0)\n",
+    "        inmcm_data_LATxMON[:, month_idx] = \\\n",
+    "            inmcm_daily_latitudal_ip[cape_thres][monthly_indicies].mean(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "17036c19-95f8-40df-a6c9-f8a23cf426f6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# initialization of a array to store time-averaged data over months\n",
+    "wrf_data_LATxMON = np.zeros((180, 12))\n",
+    "\n",
+    "# iteration over month number (starting with 0)\n",
+    "for month_idx in range(12):\n",
+    "    # filtering day indices belonging to a specific month\n",
+    "    monthly_indicies = [i for i, date in enumerate(wrf_dt_indicies)\n",
+    "                        if date.month == month_idx + 1]\n",
+    "\n",
+    "    # filling with modeling values averaged over months of the year\n",
+    "    wrf_data_LATxMON[:, month_idx] = \\\n",
+    "        wrf_daily_latitudal_ip[monthly_indicies].mean(axis=0)\n",
+    "\n",
+    "np.save(f\"./data/WRF/WRF_IP_500_T2_25_LATxMON.npy\", wrf_data_LATxMON)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e24297fc-cf81-4ea7-9a80-cdcaf277474a",
+   "metadata": {},
+   "source": [
+    "### Saving number of days (used for monthly mean) for each month"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "894ad630-17a5-4744-907e-a07768ff7848",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# saving the number of days for each month\n",
+    "# necessary for correct averaging due to \n",
+    "# different numbers of days in different months\n",
+    "\n",
+    "wrf_days = np.array([len([i for i, date in enumerate(wrf_dt_indicies) \n",
+    "                          if date.month == m + 1]) \n",
+    "                     for m in range(12)])\n",
+    "\n",
+    "inm_days = np.array([len([i for i, date in enumerate(inmcm_dt_indicies) \n",
+    "                          if date.month == m + 1]) \n",
+    "                     for m in range(12)])\n",
+    "\n",
+    "np.save(f\"./data/WRF/WRF_NUMDAYS_MON.npy\", wrf_days)\n",
+    "np.save(f\"./data/INMCM/INMCM_NUMDAYS_MON.npy\", inm_days)\n",
+    "\n",
+    "# for average over months use\n",
+    "# `(wrf_data_LATxMON[:, :].sum(axis=0)*days).sum()/days.sum()`\n",
+    "# unstead\n",
+    "# `wrf_data_LATxMON[:, :].sum(axis=0).mean()`\n",
+    "# because\n",
+    "# `((a1+a2+a3)/3 + (b1+b2)/2)/2 != (a1+a2+a3+b1+b2)/5`"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/1_Earlier_measurements_images.ipynb
+++ b/1_Earlier_measurements_images.ipynb
--- a/2_Vostok_measurements_images.ipynb
+++ b/2_Vostok_measurements_images.ipynb
--- a/3_WRF_T2_images.ipynb
+++ b/3_WRF_T2_images.ipynb
--- a/4_IP_simulations_temporal_images.ipynb
+++ b/4_IP_simulations_temporal_images.ipynb
--- a/5_IP_simulations_spatial_images.ipynb
+++ b/5_IP_simulations_spatial_images.ipynb
--- a/data/INMCM/INMCM_HOURLY_TOTAL_IP_1000.npy
+++ b/data/INMCM/INMCM_HOURLY_TOTAL_IP_1000.npy
--- a/data/INMCM/INMCM_HOURLY_TOTAL_IP_1200.npy
+++ b/data/INMCM/INMCM_HOURLY_TOTAL_IP_1200.npy
--- a/data/INMCM/INMCM_HOURLY_TOTAL_IP_800.npy
+++ b/data/INMCM/INMCM_HOURLY_TOTAL_IP_800.npy
--- a/data/INMCM/INMCM_IP_1000_LATxMON.npy
+++ b/data/INMCM/INMCM_IP_1000_LATxMON.npy
--- a/data/INMCM/INMCM_IP_1200_LATxMON.npy
+++ b/data/INMCM/INMCM_IP_1200_LATxMON.npy
--- a/data/INMCM/INMCM_IP_800_LATxMON.npy
+++ b/data/INMCM/INMCM_IP_800_LATxMON.npy
--- a/data/INMCM/INMCM_NUMDAYS_MON.npy
+++ b/data/INMCM/INMCM_NUMDAYS_MON.npy
--- a/data/Vostok/.ipynb_checkpoints/vostok_1998_2004_hourly_80percent_all-checkpoint.tsv
+++ b/data/Vostok/.ipynb_checkpoints/vostok_1998_2004_hourly_80percent_all-checkpoint.tsv
--- a/data/Vostok/.ipynb_checkpoints/vostok_daily_temp-checkpoint.csv
+++ b/data/Vostok/.ipynb_checkpoints/vostok_daily_temp-checkpoint.csv
--- a/data/Vostok/.ipynb_checkpoints/vostok_hourly_from_10_s_without_calibration_and_empty-checkpoint.tsv
+++ b/data/Vostok/.ipynb_checkpoints/vostok_hourly_from_10_s_without_calibration_and_empty-checkpoint.tsv
--- a/data/Vostok/.npy
+++ b/data/Vostok/.npy
--- a/data/Vostok/vostok_1998_2004_hourly_80percent_all.tsv
+++ b/data/Vostok/vostok_1998_2004_hourly_80percent_all.tsv
--- a/data/Vostok/vostok_2006_2020_results.npz
+++ b/data/Vostok/vostok_2006_2020_results.npz
--- a/data/Vostok/vostok_daily_pressure_mm_hg.csv
+++ b/data/Vostok/vostok_daily_pressure_mm_hg.csv
--- a/data/Vostok/vostok_daily_temp.csv
+++ b/data/Vostok/vostok_daily_temp.csv
--- a/data/Vostok/vostok_daily_wind.csv
+++ b/data/Vostok/vostok_daily_wind.csv
--- a/data/Vostok/vostok_diurnal_2006_2020.npy
+++ b/data/Vostok/vostok_diurnal_2006_2020.npy
--- a/data/Vostok/vostok_hourly_from_10_s_without_calibration_and_empty.tsv
+++ b/data/Vostok/vostok_hourly_from_10_s_without_calibration_and_empty.tsv
--- a/data/Vostok/vostok_hourly_from_5_min_without_calibration_and_empty.tsv
+++ b/data/Vostok/vostok_hourly_from_5_min_without_calibration_and_empty.tsv
--- a/data/WRF/WRF_HOURLY_TOTAL_IP_1000.npy
+++ b/data/WRF/WRF_HOURLY_TOTAL_IP_1000.npy
--- a/data/WRF/WRF_HOURLY_TOTAL_IP_1200.npy
+++ b/data/WRF/WRF_HOURLY_TOTAL_IP_1200.npy
--- a/data/WRF/WRF_HOURLY_TOTAL_IP_500_T2_25.npy
+++ b/data/WRF/WRF_HOURLY_TOTAL_IP_500_T2_25.npy
--- a/data/WRF/WRF_HOURLY_TOTAL_IP_800.npy
+++ b/data/WRF/WRF_HOURLY_TOTAL_IP_800.npy
--- a/data/WRF/WRF_IP_1000_LATxMON.npy
+++ b/data/WRF/WRF_IP_1000_LATxMON.npy
--- a/data/WRF/WRF_IP_1200_LATxMON.npy
+++ b/data/WRF/WRF_IP_1200_LATxMON.npy
--- a/data/WRF/WRF_IP_500_T2_25_LATxMON.npy
+++ b/data/WRF/WRF_IP_500_T2_25_LATxMON.npy
--- a/data/WRF/WRF_IP_800_LATxMON.npy
+++ b/data/WRF/WRF_IP_800_LATxMON.npy
--- a/data/WRF/WRF_NUMDAYS_MON.npy
+++ b/data/WRF/WRF_NUMDAYS_MON.npy
--- a/data/WRF/WRF_T2_LATxMON.npy
+++ b/data/WRF/WRF_T2_LATxMON.npy
--- a/readme.md
+++ b/readme.md
@ -0,0 +1,102 @@
				@@ -0,0 +1,102 @@
+# Short Description of the Scripts
+
+> **_Note:_** For analysis, we use simulation data of the ionospheric potential through climate models. Since these data are very large (around 350 Gb), we only upload preprocessed lower-dimensional data (around 20 Mb) to the repository. Data preparation is possible using the script `0_prepare_data.ipynb`, but this would require downloading large files from https://eee.ipfran.ru/files/seasonal-variation-2024/.
+
+* `1_Earlier_measurements_images.ipynb` plots seasonal variations from external sources
+* `2_Vostok_measurements_images.ipynb` plots seasonal variations and seasonal-dirunal diagram using new and early Vostok PG measurements
+* `3_WRF_T2_images.ipynb` plots seasonal variation of `T2m` temperature averaged across different latitude bands
+* `4_IP_simulations_temporal_images.ipynb` plots seasonal variation of simulated IP grouped by datasets and different year ranges
+* `5_IP_simulations_spatial_images.ipynb` plots seasonal variation of simulated IP grouped by latitude ranges
+
+> **_Note:_** The scripts should be executed sequentially one after another; at the very least, scripts 4 and 5 should be run after script 2. This is necessary because script 2 saves intermediate arrays of preprocessed data from the Vostok station, which are used in scripts 4 and 5.
+
+# Detailed Description of the Scripts
+
+## Script `1_Earlier_measurements_images.ipynb`
+
+This program contains digitized data from external sources, necessary for constructing Figure 1.1.
+
+At the beginning of the script, the necessary libraries are loaded and arrays with digitized data are declared; at the end, a graph is constructed. 
+
+Data analysis in this file is minimal - it calculates the amplitude of seasonal variation (as a percentage relative to the annual average value).
+
+## Script `2_Vostok_measurements_images.ipynb`
+
+This script is quite voluminous (for further understanding, see the comments in the code).
+
+Firstly, the introduction of digitized data is repeated in the code (in this case, only for the earlier data from the Vostok station, which are also used in the first script).
+
+### Preparing PG data
+
+Secondly, measurement data from the Vostok station (pre-averaged by the hour) are loaded into Pandas dataframes, both new (dataframes `df_10s` and `df_5min`) and earlier (dataframe `earlier_df_src`) datasets.
+
+New measurements at the Vostok station are combined from hourly data derived from 10-second files and hourly data derived from 5-minute files; it should be noted that the dataset primarily relies on the 10-second data, and the 5-minute data are only used when the 10-second data were unavailable (there were 24 such hours in 2013, 312 in 2015, 1752 in 2017, and 3600 in 2020). The composite series of new measurements is saved in the dataframe `df_src`.
+
+Next, we introduce helper functions. Notably, the `pass_fair_weather` function, when applied to a dataframe, retains only those days when (1) there were no gaps, (2) the potential gradient did not exceed 300 V/m and was non-negative, and (3) the peak-to-peak amplitude was no more than 150% of the average daily value of the potential gradient.
+
+The next helper functions to mention are `calculate_seasonal_var_params` and `std_error`. 
+
+They are structured such that the input to the first function is a dataframe with average daily values, and the function returns (1) an array of 12 average monthly values of PG, (2) an array of 12 counts of fair weather days per month, and (3) an array of 12 sums of squares of the average daily PG values of fair weather divided by the number of fair weather days, annotated by the following formula:
+
+sumₘ = Σ(daily mean PG for the `i`-th fair weather day)² / (count of fair weather days),
+
+where `m` denotes the month number `m = 1...12`, and `i` iterates over all fair weather days for which the month of the date equals `m`.
+
+The `std_error` function is designed to take the output from the `calculate_seasonal_var_param`s function and return 12 values of the standard error, one for each month.
+
+Both described functions are used to compute values necessary for plotting graphs (mean value ± standard error).
+
+For both new and early Vostok data, we apply the `pass_fair_weather` function, resulting in two datasets that contain only the hours of fair weather days (`df` and `earlier_df`)
+
+### Figure 1.2
+
+To construct Figure 1.2, using the prepared data and helper functions, we calculate the mean values, the count of fair weather days and standard errors for three sets of data:
+
+1. The complete series of new Vostok data.
+2. The same series up to and including the year 2012.
+3. The same series after the year 2012.
+
+> **_Note:_** The data from this figure is saved in the temporary file `vostok_2006_2020_results.npz` for use in the second article. This helps avoid code duplication or merging code to build different entities in a single cumbersome file.
+
+### Figure 1.3
+
+To construct Figure 1.3, we transform the Vostok data series into a matrix of 12 months x 24 hours. To do this, we group the original dataframe of fair weather hours by months and hours, and then find the mean value for all data points taken at a specific hour of a specific month (saved in dataframe `sd_df`).
+
+For clarity, we also present slices of this diurnal-seasonal diagram at 3, 9, 15, and 21 hours UTC.
+
+> **_Note:_**  Renaming the axes of the multi-index resulting from grouping (`sd_df.index.set_names(['hour', 'month'], inplace=True)`) is not necessary for the code and can be commented out; however, it may be convenient for further work with the diurnal-seasonal dataframe `sd_df`.
+
+### Figure 1.5
+#### Removal of field anomalies associated with meteorological parameters
+
+First, we load the meteorological datasets (`temp_df`, `wind_df`, `pressure_df`), averaged by days (`vostok_daily_temp`, `vostok_daily_wind`, `vostok_daily_pressure_mm_hg`). For further analysis, we use the `meteo_df` dataframe, which is created by merging the dataframe with daily average potential gradient values (`daily_df`).
+
+Next, we compile arrays of PG anomalies and anomalies for all meteorological parameters. The anomaly is calculated using a moving window of +-10 days.
+
+We then find the regression coefficients `temp_coeffs`, `wind_coeffs`, and `pres_coeffs` between the PG anomaly and the corresponding meteorological parameter anomalies, and calculate some statistical characteristics.
+
+Using the found regression coefficients, we remove the linear relationship with meteorological parameter anomalies. The corrected PG is saved in `meteo_df["CorrectedField"]`.
+
+Finally, we construct Figure 1.5 using the prepared data in the same manner as was done for Figures 1.2 and 1.3.
+
+
+## Script `3_WRF_T2_images.ipynb`
+
+This script calculates the seasonal variation of the 2m-level temperature (T2m) taken from climate modeling results (see article 1).
+
+In the script, temperature data averaged by longitude and by month are loaded (see data description below) from `WRF_T2_MONxLAT.npy`.
+
+Next, the temperature is averaged across latitude bands 20° S–20° N, 30° S–30° N, 40° S–40° N, and 50° S–50° N. The averaging takes into account the latitudinal area factor; degree cells at higher latitudes are summed with a diminishing coefficient. The results of the averaging (seasonal temperature variation in the specified latitude band) are displayed on a figure 1.4, 2.3 consisting of four panels.
+
+## Script `4_IP_simulations_temporal_images.ipynb`
+...
+
+## Script `5_IP_simulations_spatial_images.ipynb`
+...
+
+# Description of the data files
+
+* `WRF_T2_MONxLAT.npy` - a `numpy` array with the shape `(180, 12)`, containing montly averaged 2-meter level temperature (`T2m`) for each 1° latitude band across a full 360° longitude. `T2m` are calculated with the Weather Research and Forecasting model (WRF) version 4.3.
+* `vostok_hourly <...>.txt` - text files containing  two columns, one of which represents the date and time (column `Datetime`) and the other, hourly averaged potential gradient (PG) values on the basis of the latest measurements at the Russian Antarctic station Vostok (column `Field`, the units are V/m).
+* `vostok_1998_2004_hourly_80percent_all.tsv` - the same as the previous file, but these are early data collected by a different sensor during 1998-2004 
+* `vostok_daily <...>.txt` - text files containing three columns: one for the date (column `UTC`), the second for the daily averaged meteorological parameter based on measurements at the Russian Antarctic station Vostok, and the third column `Count` indicating the number of measurements per day (entries with fewer than 4 measurements must be filtered out before analysis).