{
"cells": [
{
"cell_type": "markdown",
"id": "dd50d20e",
"metadata": {
"tags": [
"parameters",
"remove-cell"
]
},
"source": [
"\n",
"Before starting, remember to activate the environment:
\n",
"**source env/bin/activate**\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6f7d98ce-47ad-4b54-b820-57e676d269b0",
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-10T13:24:53.606230Z",
"iopub.status.busy": "2025-11-10T13:24:53.605388Z",
"iopub.status.idle": "2025-11-10T13:24:55.377406Z",
"shell.execute_reply": "2025-11-10T13:24:55.376812Z"
},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"from md_intrinsic_dimension import intrinsic_dimension, section_id, secondary_structure_id\n",
"from moleculekit.molecule import Molecule"
]
},
{
"cell_type": "markdown",
"id": "76030250",
"metadata": {},
"source": [
"# Intrinsic Dimension\n"
]
},
{
"cell_type": "markdown",
"id": "cee55a4c",
"metadata": {},
"source": [
"Computes the ID of the system over the entire molecular dynamics trajectory.
\n",
"If id_method is ``local`` the function returns:\n",
"* the averaged value of instantaneous ID computed on the entire trajectory.\n",
"* the averaged value of instantaneous ID computed from the ``last`` frames to the end of the trajectory.\n",
"* the instantaneous ID computed frame by frame on the entire trajectory."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "65e0cb7d",
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-10T13:24:55.381307Z",
"iopub.status.busy": "2025-11-10T13:24:55.381005Z",
"iopub.status.idle": "2025-11-10T13:25:03.264027Z",
"shell.execute_reply": "2025-11-10T13:25:03.263219Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean instantaneous ID of the entire trajectory: 25.05909587663137\n",
"Mean instantaneous ID of the last 100 frames: 25.603595848008236\n",
"Istantaneous ID of the entire trajectory: \n",
" [25.29562521 25.3016539 25.58546274 ... 25.56638913 25.2186398\n",
" 25.43423956]\n"
]
}
],
"source": [
"mean_all, mean_last, local_id = intrinsic_dimension(topology = 'examples/villin/2f4k.pdb', trajectory = 'examples/villin/2f4k_f0.xtc', projection_method='Dihedral', id_method='local', verbose=False)\n",
"\n",
"print('Mean instantaneous ID of the entire trajectory:', mean_all)\n",
"print('Mean instantaneous ID of the last 100 frames:', mean_last)\n",
"print('Istantaneous ID of the entire trajectory: \\n', local_id[5:])"
]
},
{
"cell_type": "markdown",
"id": "ee8e03dd",
"metadata": {},
"source": [
"If id_method is ``global`` the function returns:\n",
"* the value of global ID computed on the entire trajectory.\n",
"* the value of global ID computed on the ``last`` number of frames of the trajectory.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6f6ae2bb",
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-10T13:25:03.269182Z",
"iopub.status.busy": "2025-11-10T13:25:03.268764Z",
"iopub.status.idle": "2025-11-10T13:25:03.691146Z",
"shell.execute_reply": "2025-11-10T13:25:03.690470Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Global ID of the entire trajectory: 25.89533641150811\n",
"Global ID of the last 100 frames: 27.14144547061944\n"
]
}
],
"source": [
"global_all, global_last = intrinsic_dimension(topology = 'examples/villin/2f4k.pdb', trajectory = 'examples/villin/2f4k_f0.xtc', projection_method='Dihedral', id_method='global', verbose = False)\n",
"\n",
"print('Global ID of the entire trajectory:', global_all)\n",
"print('Global ID of the last 100 frames:', global_last)"
]
},
{
"cell_type": "markdown",
"id": "e95cd257-709a-447a-9439-050f82f3db18",
"metadata": {},
"source": [
"# Section ID"
]
},
{
"cell_type": "markdown",
"id": "a09cb899-c8b9-47b1-9268-b9446d4ae815",
"metadata": {},
"source": [
"This function computes ID over **sliding windows** of a protein sequence.\n",
"\n",
"\n",
"**Additional Parameters**\n",
"\n",
"\n",
"- ``window_size`` (int): window length in residues (default = 10)\n",
"- ``stride`` (int): number of residues between two windows (default = 1)\n",
"\n",
"Returns a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f6baa084-42a1-43c4-8991-3577eb00263a",
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-10T13:25:03.693567Z",
"iopub.status.busy": "2025-11-10T13:25:03.693232Z",
"iopub.status.idle": "2025-11-10T13:25:17.928907Z",
"shell.execute_reply": "2025-11-10T13:25:17.928141Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID table: \n",
" start end entire simulation last simulation \\\n",
"0 42 56 15.020170 15.107589 \n",
"1 47 61 15.256653 15.226222 \n",
"2 52 66 15.899548 15.847965 \n",
"3 57 71 16.098079 16.012590 \n",
"4 62 76 15.407374 15.417718 \n",
"\n",
" instantaneous \n",
"0 [15.16317145884061, 14.100304302156, 15.245204... \n",
"1 [15.318657037027966, 14.668348024880796, 14.71... \n",
"2 [15.828122242566746, 15.889626441326188, 15.31... \n",
"3 [16.425784249177884, 16.798194425179883, 15.09... \n",
"4 [16.13297162330056, 15.402657321288679, 15.740... \n"
]
}
],
"source": [
"results = section_id(topology = 'examples/villin/2f4k.pdb', trajectory = 'examples/villin/2f4k_f0.xtc', \n",
" window_size=15, stride=5 , projection_method='Dihedral', verbose =False)\n",
"print(f'ID table: \\n {results.head()}')"
]
},
{
"cell_type": "markdown",
"id": "0d2649c0",
"metadata": {},
"source": [
"# Secondary Structure ID"
]
},
{
"cell_type": "markdown",
"id": "88e537e2",
"metadata": {},
"source": [
"\n",
"This function computes ID over **secondary structure elements**.\n",
"\n",
"**Additional Parameters**\n",
"\n",
"- ``simplified`` (bool): if True (default), uses simplified DSSP codes coil (C), strand (S) or helix (H); else helix (H), beta bridge (B), extended strand (E), three helix (G), hydrogen bonded turn (T), bend (S), loop or irregular element ( ).\n",
"\n",
"Returns\n",
"\n",
"- A DataFrame with ID values per secondary structure\n",
"- A DataFrame with DSSP assignment per residue\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bff43afb",
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-10T13:25:17.930967Z",
"iopub.status.busy": "2025-11-10T13:25:17.930758Z",
"iopub.status.idle": "2025-11-10T13:25:40.273819Z",
"shell.execute_reply": "2025-11-10T13:25:40.269651Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID table:\n",
" start end sec str type window entire simulation \\\n",
"0 42 43 C [42, 43] 1.963018 \n",
"1 44 50 H [44, 45, 46, 47, 48, 49, 50] 9.274704 \n",
"2 51 54 C [51, 52, 53, 54] 5.384093 \n",
"3 55 58 H [55, 56, 57, 58] 5.146533 \n",
"4 59 62 C [59, 60, 61, 62] 5.439168 \n",
"\n",
" last simulation instantaneous \n",
"0 1.959972 [2.0501716461112105, 2.054921447896053, 2.0109... \n",
"1 9.354203 [9.473939433431966, 9.384160489840434, 8.75161... \n",
"2 5.381422 [5.132274328254565, 5.269852589635324, 5.23606... \n",
"3 5.138285 [5.117460673233659, 5.350647768246517, 4.86918... \n",
"4 5.440862 [5.807002533132328, 5.903334979361287, 5.04612... \n",
"\n",
" Secondary structure assignments:\n",
" resid index resname sec str type\n",
"0 42 LEU C\n",
"1 43 SER C\n",
"2 44 ASP H\n",
"3 45 GLU H\n",
"4 46 ASP H\n"
]
}
],
"source": [
"mol_ref=Molecule(\"examples/villin/2f4k.pdb\")\n",
"results, secStr =secondary_structure_id(topology = 'examples/villin/2f4k.pdb', trajectory = 'examples/villin/2f4k_f0.xtc', \n",
" mol_ref = mol_ref,\n",
" simplified = True , projection_method='Dihedral', id_method='local', verbose = False)\n",
"\n",
"print(f'ID table:\\n {results.head(5)}')\n",
"print(f'\\n Secondary structure assignments:\\n {secStr.head(5)}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}