{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3ec138f9",
   "metadata": {},
   "source": [
    "# Process an activity report"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5202118",
   "metadata": {},
   "source": [
    "This example shows how to analyze and display an activity report.\n",
    "\n",
    "This example makes extensive use of the [pandas](https://pandas.pydata.org/) and\n",
    "[plotly](https://plotly.com/python/) libraries. Examples using these libraries are presented here largely without\n",
    "explanation. Consult the documentation for these packages to understand any limitations of the approaches demonstrated\n",
    "here, and also how to modify and extend these examples."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70048065",
   "metadata": {},
   "source": [
    "## Fetch the activity report"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9af0940",
   "metadata": {},
   "source": [
    "Import the ``Connection`` class, create the connection, and fetch the activity report for the 12 months between July\n",
    "2024 and June 2025. For more information, see the [Basic Usage](../1_Basic_usage.ipynb) and\n",
    "[Access an activity report](2-1_Access_activity_report.ipynb) examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d810dfef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import date\n",
    "\n",
    "from ansys.grantami.system import ActivityReportFilter, Connection\n",
    "\n",
    "connection = Connection(\"http://my_grantami_server/mi_servicelayer\").with_autologon()\n",
    "client = connection.connect()\n",
    "date_filter = (\n",
    "    ActivityReportFilter()\n",
    "    .with_date_from(date.fromisoformat(\"2024-07-01\"), inclusive=True)\n",
    "    .with_date_to(date.fromisoformat(\"2025-06-30\"), inclusive=True)\n",
    ")\n",
    "items = client.get_activity_report_where(date_filter)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3833130c",
   "metadata": {},
   "source": [
    "## Process the activity report using a DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cb0e751",
   "metadata": {},
   "source": [
    "Prepare the activity report items as a list of dictionaries, where the information for a single activity log entry is\n",
    "represented as a dictionary.\n",
    "\n",
    "ActivityItem is a dataclass, so use the standard library `dataclasses.asdict()` function to create a dictionary for\n",
    "each item."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bcca7f31",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataclasses import asdict\n",
    "\n",
    "rows = [asdict(item) for item in items]\n",
    "rows[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f830f4e",
   "metadata": {},
   "source": [
    "Now use this data representation to create a dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35ed046f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df_raw = pd.DataFrame(rows)\n",
    "df_raw.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d1f7afb",
   "metadata": {},
   "source": [
    "Process the data:\n",
    "\n",
    "1. `usage_mode`: Extract the value from the Enum using `DataFrame.map()`.\n",
    "2. `application_names`: Flatten the structure using `DataFrame.explode()` and `DataFrame.drop_duplicates()`.\n",
    "3. `activity_date`: Convert the builtin `datetime.date` object to the more powerful `pandas.Timestamp` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba34d496",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_processed = df_raw.copy()\n",
    "\n",
    "# 1. Extract value from enum\n",
    "df_processed[\"usage_mode\"] = df_raw[\"usage_mode\"].map(lambda x: x.value)\n",
    "\n",
    "# 2. Flatten application_names\n",
    "df_processed = df_processed.explode(column=\"application_names\")\n",
    "df_processed = df_processed.drop_duplicates()\n",
    "\n",
    "# 3. Convert date to pandas.Timestamp\n",
    "df_processed[\"activity_date\"] = df_processed[\"activity_date\"].map(lambda x: pd.Timestamp.fromisoformat(x.isoformat()))\n",
    "\n",
    "df_processed.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3d56fdb",
   "metadata": {},
   "source": [
    "The data can be written to an Excel file using the `to_excel()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fd2b007",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_processed.to_excel(\"activity_data.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "547d95f0",
   "metadata": {},
   "source": [
    "## Aggregated usage by month"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40bebf50",
   "metadata": {},
   "source": [
    "A common use case is to track the unique Granta MI users per month.\n",
    "\n",
    "Use the `pd.pivot_table()` method to perform a pivot on the data:\n",
    "* Specify `username` as the column that contains the values of interest.\n",
    "* Use a `pd.Grouper()` object for the index. Use the `MS` frequency to group the `activity_date` column by month.\n",
    "* Use `nunique()` to count the unique `username` values in each `activity_date` group.\n",
    "\n",
    "For a more detailed description of `pivot_table()`, `Grouper()`, and `nunique()`, consult the pandas documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db945d93",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_activity_per_month = pd.pivot_table(\n",
    "    data=df_processed,\n",
    "    values=\"username\",\n",
    "    index=pd.Grouper(key=\"activity_date\", freq=\"MS\"),\n",
    "    aggfunc=lambda x: x.nunique(),\n",
    ")\n",
    "df_activity_per_month.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "988dc05c",
   "metadata": {},
   "source": [
    "Finally, plot a bar chart using the `plotly` express API. See the plotly documentation for more information on the\n",
    "methods used in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1892207",
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.express as px\n",
    "\n",
    "fig = px.bar(\n",
    "    df_activity_per_month,\n",
    "    x=df_activity_per_month.index,\n",
    "    y=\"username\",\n",
    "    title=\"Unique users per month\",\n",
    "    labels={\n",
    "        \"username\": \"Unique user count\",\n",
    "        \"activity_date\": \"Month\",\n",
    "    },\n",
    ")\n",
    "fig.update_xaxes(showgrid=True, dtick=\"M1\", tickformat=\"%b\\n%Y\")\n",
    "fig.update_layout(bargap=0.1)\n",
    "\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb8cf8ec",
   "metadata": {},
   "source": [
    "## Aggregated usage by month and database"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2d1d37b",
   "metadata": {},
   "source": [
    "Alternatively, include the database key in the pivot to create a plot of unique users per database per month.\n",
    "\n",
    "First, filter out rows which do not relate to a particular database. Then, use the `pd.pivot_table()` method\n",
    "again, but add the `database_key` column to the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c5c3897",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_database_rows = df_processed[df_processed[\"database_key\"].notnull()]\n",
    "\n",
    "df_activity_per_database_per_month = pd.pivot_table(\n",
    "    data=df_database_rows,\n",
    "    values=\"username\",\n",
    "    index=[\"database_key\", pd.Grouper(key=\"activity_date\", freq=\"MS\")],\n",
    "    aggfunc=lambda x: x.nunique(),\n",
    ")\n",
    "df_activity_per_database_per_month.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a0d0db8",
   "metadata": {},
   "source": [
    "Iterate over each sub-frame grouped by the `database_key` index level, and create a separate bar chart trace\n",
    "for each. Then combine the separate traces on a single figure, and set the display mode to `\"stack\"`.\n",
    "\n",
    "This cell uses the more powerful `graph_objects` API in plotly. See the plotly documentation for more information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "144de17e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.graph_objects as go\n",
    "\n",
    "# Create the figure\n",
    "fig = go.Figure()\n",
    "\n",
    "# Iterate over each value in the \"database_key\" index level\n",
    "for database_key, df_activity_per_month in df_activity_per_database_per_month.groupby(\"database_key\"):\n",
    "    trace = go.Bar(\n",
    "        name=database_key,\n",
    "        x=df_activity_per_month.index.get_level_values(\"activity_date\"),\n",
    "        y=df_activity_per_month.username,\n",
    "    )\n",
    "    fig.add_trace(trace)\n",
    "\n",
    "fig.update_xaxes(title=\"Month\", showgrid=True, dtick=\"M1\", tickformat=\"%b\\n%Y\")\n",
    "fig.update_yaxes(title=\"Unique user count\")\n",
    "fig.update_layout(title=\"Unique users per month, grouped by database\", barmode=\"stack\", bargap=0.1)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1c4891a",
   "metadata": {},
   "source": [
    "## Aggregated usage by usage mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa04e92c",
   "metadata": {},
   "source": [
    "A different way of grouping users is by whether they are read or write users of Granta MI.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Info:**\n",
    "\n",
    "The usage mode is only available on activity report items created by Granta MI 2026 R1 and later. All activity report\n",
    "items created by Granta MI 2025 R2 and earlier are returned as 'view' activities.\n",
    "</div>\n",
    "\n",
    "First, use the `pd.pivot_table()` method to perform a pivot on the data:\n",
    "* Specify `username` as the column that contains the values of interest.\n",
    "* Specify `usage_mode` as the index to group the data by usage mode.\n",
    "* Use `unique()` to return the unique usernames in each usage mode group.\n",
    "\n",
    "Next, subtract the set of 'edit' users from the set of 'view' users. This avoids double-counting users who both edit\n",
    "and view data.\n",
    "\n",
    "Finally, calculate the length of each set of users by using `.apply()` to apply the `len()` function to each cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ecee772",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_by_usage_mode = df_processed.pivot_table(\n",
    "    values=\"username\",\n",
    "    index=\"usage_mode\",\n",
    "    aggfunc=lambda x: set(x.unique()),\n",
    ")\n",
    "# The 'view' column in df_by_usage_mode now contains sets of users in each group\n",
    "df_by_usage_mode.loc[\"view\"] = df_by_usage_mode.loc[\"view\"] - df_by_usage_mode.loc[\"edit\"]\n",
    "df_by_usage_mode[\"username\"] = df_by_usage_mode[\"username\"].apply(len)\n",
    "df_by_usage_mode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4b060c",
   "metadata": {},
   "source": [
    "Plot a bar chart the `plotly` express API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e73bf8f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = px.bar(\n",
    "    df_by_usage_mode,\n",
    "    x=df_by_usage_mode.index,\n",
    "    y=\"username\",\n",
    "    title=\"Unique users by usage mode\",\n",
    "    labels={\n",
    "        \"username\": \"Unique user count\",\n",
    "        \"usage_mode\": \"Usage mode\",\n",
    "    },\n",
    ")\n",
    "fig.show()"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,py:light"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}