Skip to content

RACIPE Formalism

Functions related to RACIPE simulation.

GRN Parsing and ODE Generation Functions

grins.reg_funcs.psH(nod, fld, thr, hill)

Positive Shifted Hill function.

Parameters:

  • nod (float) –

    The node expression value.

  • fld (float) –

    The fold change.

  • thr (float) –

    The half-maximal threshold value.

  • hill (float) –

    The hill coefficient.

Returns:

  • float

    The value of the Positive Shifted Hill function.

Source code in grins/reg_funcs.py
def psH(nod, fld, thr, hill):
    """
    Positive Shifted Hill function.

    Parameters
    ----------
    nod : float
        The node expression value.
    fld : float
        The fold change.
    thr : float
        The half-maximal threshold value.
    hill : float
        The hill coefficient.

    Returns
    -------
    float
        The value of the Positive Shifted Hill function.
    """
    return (fld + (1 - fld) * (1 / (1 + (nod / thr) ** hill))) / fld

grins.reg_funcs.nsH(nod, fld, thr, hill)

Negative Shifted Hill function.

Parameters:

  • nod (float) –

    The node expression value.

  • fld (float) –

    The fold change.

  • thr (float) –

    The half-maximal threshold value.

  • hill (float) –

    The hill coefficient.

Returns:

  • float

    The value of the Negative Shifted Hill function.

Source code in grins/reg_funcs.py
def nsH(nod, fld, thr, hill):
    """
    Negative Shifted Hill function.

    Parameters
    ----------
    nod : float
        The node expression value.
    fld : float
        The fold change.
    thr : float
        The half-maximal threshold value.
    hill : float
        The hill coefficient.

    Returns
    -------
    float
        The value of the Negative Shifted Hill function.
    """
    return fld + (1 - fld) * (1 / (1 + (nod / thr) ** hill))

grins.gen_diffrax_ode.gen_diffrax_odesys(topo_df, topo_name, save_dir='.')

Generate the ODE system code for diffrax based on the given topology dataframe.

Args: topo_df (pandas.DataFrame): The topology dataframe containing the edges information. topo_name (str): The name of the topology. save_dir (str, optional): The directory to save the generated code. Defaults to ".".

Returns: None: Saves the generated file in the driectory specified by save_dir.

Source code in grins/gen_diffrax_ode.py
def gen_diffrax_odesys(topo_df, topo_name, save_dir="."):
    """
    Generate the ODE system code for diffrax based on the given topology dataframe.

    Args:
        topo_df (pandas.DataFrame): The topology dataframe containing the edges information.
        topo_name (str): The name of the topology.
        save_dir (str, optional): The directory to save the generated code. Defaults to ".".

    Returns:
        None:  Saves the generated file in the driectory specified by save_dir.
    """
    # Get the list of parameters, target nodes and source nodes
    param_names_list, target_nodes, source_nodes = gen_param_names(topo_df)
    # List of unique nodes
    unique_nodes = sorted(set(target_nodes + source_nodes))
    # Inititalise a list to store the ODE strings
    ode_list = [
        "import grins.reg_funcs as regfn\n",
        "def odesys(t,y,args):",
        f"\t({', '.join(unique_nodes)}) = y",
        f"\t({', '.join(param_names_list)}) = args",
    ]
    # Loop through the target nodes
    for ni, nod in enumerate(unique_nodes):
        # Get the edges where n is the target node
        target_edges = topo_df[topo_df["Target"] == nod]
        # The diffrax ODE for each node is d_<nod> = <ODE>
        ode_list.append("\t" + f"d_{nod} = {gen_node_ode(target_edges, nod)}")
    # Append the d_y line
    ode_list.append(f"\td_y = ({', '.join([f'd_{nod}' for nod in unique_nodes])})")
    # Append the end line
    ode_list.append("\treturn d_y\n")
    # Write the lines to a file
    with open(f"{save_dir}/{topo_name}.py", "w") as f:
        f.write("\n".join(ode_list))

Parameters and Intial Conditions Generation Functions

grins.gen_params.parse_topos(topofile, save_cleaned=False)

Parse and cleans the given topofile and return the dataframe. It is expeced that the topo files is a tab-separated file with three columns: Source, Target, and Type. White spaces can also be used to separate the columns.

For nodes that are not alphanumeric, the function will replace the non-alphanumeric characters with an underscore and prepend "Node_" if the node name does not start with an alphabet. The cleaned topology file will be saved if the save_cleaned flag is set to True.

Parameters:

  • topofile (str) –

    The path to the topofile.

  • save_cleaned (bool, default: False ) –

    If True, save the cleaned topology file. Defaults to False.

Returns:

  • topo_df ( DataFrame ) –

    The parsed dataframe.

Source code in grins/gen_params.py
def parse_topos(topofile: str, save_cleaned: bool = False) -> pd.DataFrame:
    """
    Parse and cleans the given topofile and return the dataframe. It is expeced that the topo files is a tab-separated file with three columns: Source, Target, and Type. White spaces can also be used to separate the columns.

    For nodes that are not alphanumeric, the function will replace the non-alphanumeric characters with an underscore and prepend "Node_" if the node name does not start with an alphabet. The cleaned topology file will be saved if the save_cleaned flag is set to True.

    Parameters
    ----------
    topofile : str
        The path to the topofile.
    save_cleaned : bool, optional
        If True, save the cleaned topology file. Defaults to False.

    Returns
    -------
    topo_df : pd.DataFrame
        The parsed dataframe.
    """
    topo_df = pd.read_csv(topofile, sep=r"\s+")
    if topo_df.shape[1] != 3:
        raise ValueError(
            "The topology file should have three columns: Source, Target, and Type."
        )
    if not all(col in topo_df.columns for col in ["Source", "Target", "Type"]):
        raise ValueError(
            "The topology file should have the columns: Source, Target, and Type."
        )
    topo_df = topo_df[["Source", "Target", "Type"]]
    # Clean up node names: replace non-alphanumerics and prepend "Node_" if needed.
    topo_df["Source"] = (
        topo_df["Source"]
        .str.replace(r"\W", "_", regex=True)
        .apply(lambda x: f"Node_{x}" if not x[0].isalpha() else x)
    )
    topo_df["Target"] = (
        topo_df["Target"]
        .str.replace(r"\W", "_", regex=True)
        .apply(lambda x: f"Node_{x}" if not x[0].isalpha() else x)
    )
    if topo_df["Type"].nunique() > 2:
        raise ValueError(f"Check the topo file: {topofile}")
    if save_cleaned:
        topo_df.to_csv(
            topofile.replace(".topo", "_cleaned.topo"), sep="\t", index=False
        )
    return topo_df

grins.gen_params.gen_param_names(topo_df)

Generate parameter names based on the given topology dataframe.

Parameters:

  • topo_df (DataFrame) –

    The topology dataframe containing the information about the nodes and edges.

Returns:

  • tuple

    A tuple containing the parameter names, unique target node names, and unique source node names.

Source code in grins/gen_params.py
def gen_param_names(topo_df: pd.DataFrame) -> Tuple[List[str], List[str], List[str]]:
    """
    Generate parameter names based on the given topology dataframe.

    Parameters
    ----------
    topo_df : pd.DataFrame
        The topology dataframe containing the information about the nodes and edges.

    Returns
    -------
    tuple
        A tuple containing the parameter names, unique target node names, and unique source node names.
    """
    target_nodes = list(topo_df["Target"].unique())
    source_nodes = list(topo_df["Source"].unique())
    unique_nodes = sorted(set(source_nodes + target_nodes))
    param_names = [f"Prod_{n}" for n in unique_nodes] + [
        f"Deg_{n}" for n in unique_nodes
    ]
    for tn in unique_nodes:
        sources = topo_df[topo_df["Target"] == tn]["Source"].sort_values()
        for sn, p in it.product(sources, ["Fld", "Thr", "Hill"]):
            param_names.append(
                f"{p}_{sn}_{tn}" if p != "Fld" else _get_regtype(sn, tn, topo_df)
            )
    return param_names, target_nodes, source_nodes

grins.gen_params.sample_distribution(method, dimension, num_points, std_dev=None, optimise=False)

Generates a sample distribution based on the specified method.

Parameters:

  • method (str) –

    The sampling method to use. Options are "Sobol", "LHS", "Uniform", "LogUniform", "Normal", "LogNormal".

  • dimension (int) –

    The number of dimensions for the sample points.

  • num_points (int) –

    The number of sample points to generate.

  • std_dev (Optional[float], default: None ) –

    The standard deviation for the "Normal" and "LogNormal" distributions. Defaults to 1.0 if not provided.

  • optimise (bool, default: False ) –

    Whether to optimise the sampling process. Applicable for "Sobol" and "LHS" methods.

Returns:

  • ndarray

    An array of sample points generated according to the specified method.

Raises:

  • ValueError

    If an unknown sampling method is specified.

Source code in grins/gen_params.py
def sample_distribution(
    method: str,
    dimension: int,
    num_points: int,
    std_dev: Optional[float] = None,
    optimise: bool = False,
) -> np.ndarray:
    """
    Generates a sample distribution based on the specified method.

    Parameters
    ----------
    method : str
        The sampling method to use. Options are "Sobol", "LHS", "Uniform", "LogUniform", "Normal", "LogNormal".
    dimension : int
        The number of dimensions for the sample points.
    num_points : int
        The number of sample points to generate.
    std_dev : Optional[float], optional
        The standard deviation for the "Normal" and "LogNormal" distributions. Defaults to 1.0 if not provided.
    optimise : bool, optional
        Whether to optimise the sampling process. Applicable for "Sobol" and "LHS" methods.

    Returns
    -------
    np.ndarray
        An array of sample points generated according to the specified method.

    Raises
    ------
    ValueError
        If an unknown sampling method is specified.
    """
    if method == "Sobol":
        dist_arr = _gen_sobol_seq(dimension, num_points, optimise)
        # return _gen_sobol_seq(dimension, num_points, optimise)
    elif method == "LHS":
        dist_arr = _gen_latin_hypercube(dimension, num_points, optimise)
        # return _gen_latin_hypercube(dimension, num_points, optimise)
    elif method == "Uniform":
        dist_arr = _gen_uniform_seq(dimension, num_points)
        # return _gen_uniform_seq(dimension, num_points)
    elif method == "LogUniform":
        dist_arr = _gen_loguniform_seq(dimension, num_points)
        # return _gen_loguniform_seq(dimension, num_points)
    elif method == "Normal":
        dist_arr = _gen_normal(
            dimension, num_points, std_dev if std_dev is not None else 1.0
        )
        # return _gen_normal(
        #     dimension, num_points, std_dev if std_dev is not None else 1.0
        # )
    elif method == "LogNormal":
        dist_arr = _gen_lognormal(
            dimension, num_points, std_dev if std_dev is not None else 1.0
        )
        # return _gen_lognormal(
        #     dimension, num_points, std_dev if std_dev is not None else 1.0
        # )
    else:
        raise ValueError(f"Unknown sampling method: {method}")
    # Shuffle the array to avoid the correlation between the parameters
    return np.random.permutation(dist_arr.flatten()).reshape(dist_arr.shape)

grins.gen_params.get_thr_ranges(source_node, topo_df, prange_df, num_params=2 ** 10)

Calculate the median threshold range for a given source node.

Parameters:

  • source_node (str) –

    The source node for which the threshold range is calculated.

  • topo_df (DataFrame) –

    DataFrame containing the topology information.

  • prange_df (DataFrame) –

    DataFrame containing the parameter ranges.

  • num_params (int, default: 2 ** 10 ) –

    Number of parameters to sample. Defaults to 1024.

Returns:

  • float

    The median threshold range for the given source node.

Source code in grins/gen_params.py
def get_thr_ranges(
    source_node: str,
    topo_df: pd.DataFrame,
    prange_df: pd.DataFrame,
    num_params: int = 2**10,
    # optimise: bool = False,
) -> float:
    """
    Calculate the median threshold range for a given source node.

    Parameters
    ----------
    source_node : str
        The source node for which the threshold range is calculated.
    topo_df : pd.DataFrame
        DataFrame containing the topology information.
    prange_df : pd.DataFrame
        DataFrame containing the parameter ranges.
    num_params : int, optional
        Number of parameters to sample. Defaults to 1024.

    Returns
    -------
    float
        The median threshold range for the given source node.
    """
    # Get the production and degradation rates for the source node
    sn_params = prange_df[
        prange_df["Parameter"].str.contains(f"Prod_{source_node}|Deg_{source_node}")
    ]
    # Sample the production and degradation rates for the source node
    sn_gk = sample_param_df(sn_params, num_params)
    # Calculate the g/k values for the source node
    sn_gk_n = (sn_gk[f"Prod_{source_node}"] / sn_gk[f"Deg_{source_node}"]).to_numpy()
    # Get the incoming edges for the source node
    in_edge_topo = topo_df[topo_df["Target"] == source_node]
    # If there are incoming edges, calculate the updated g/k values based on the Hills equation
    if not in_edge_topo.empty:
        isn = "|".join(in_edge_topo["Source"].unique())
        # Get the parameters for the incoming edges
        in_edge_params = prange_df[
            prange_df["Parameter"].str.contains(
                f"Fld_{isn}_{source_node}|Thr_{isn}_{source_node}|Hill_{isn}_{source_node}"
            )
            | prange_df["Parameter"].str.contains(f"Prod_{isn}|Deg_{isn}")
        ]
        # Sample the parameters for the incoming edges
        isn_gk = sample_param_df(
            in_edge_params[in_edge_params["Parameter"].str.contains("Prod_|Deg_")],
            num_params,
        )
        # Calculate the updated g/k values based on the Hills equation for the incoming edges
        for in_node in in_edge_topo["Source"].unique():
            # print(f"Processing incoming edge from {in_node} to {source_node}")
            in_gk = isn_gk[f"Prod_{in_node}"] / isn_gk[f"Deg_{in_node}"]
            in_gk_median = np.median(in_gk)
            # print(f"Median g/k value for {in_node}: {in_gk_median}")
            in_edge_params.loc[
                in_edge_params["Parameter"].str.contains(
                    f"Thr_{in_node}_{source_node}"
                ),
                ["Minimum", "Maximum"],
            ] = [0.02 * in_gk_median, 1.98 * in_gk_median]
        # Update the g/k values for the source node based on the incoming edges and return the median g/k value
        return _get_updated_gkn_hills(sn_gk_n, in_edge_params, in_edge_topo, num_params)
    else:
        # If there are no incoming edges, return the median g/k value for the source node
        return np.median(sn_gk[f"Prod_{source_node}"] / sn_gk[f"Deg_{source_node}"])

grins.gen_params.gen_param_range_df(topo_df, num_params=2 ** 10, sampling_method='Sobol', thr_rows=True)

Generate a parameter range DataFrame from the topology DataFrame.

Parameters:

  • topo_df (DataFrame) –

    The topology DataFrame containing the network structure.

  • num_params (int, default: 2 ** 10 ) –

    The number of parameters to generate. Default is 1024.

  • sampling_method (Union[str, dict], default: 'Sobol' ) –

    The sampling method to use. Can be a string specifying a single method for all parameters or a dictionary specifying methods for individual parameters. Default is "Sobol".

  • thr_rows (bool, default: True ) –

    Whether to add threshold rows to the DataFrame. Default is True.

Returns:

  • DataFrame

    A DataFrame containing the parameter ranges and sampling methods.

Source code in grins/gen_params.py
def gen_param_range_df(
    topo_df: pd.DataFrame,
    num_params: int = 2**10,
    sampling_method: Union[str, dict] = "Sobol",
    thr_rows: bool = True,
) -> pd.DataFrame:
    """
    Generate a parameter range DataFrame from the topology DataFrame.

    Parameters
    ----------
    topo_df : pd.DataFrame
        The topology DataFrame containing the network structure.
    num_params : int, optional
        The number of parameters to generate. Default is 1024.
    sampling_method : Union[str, dict], optional
        The sampling method to use. Can be a string specifying a single method for all parameters or a dictionary specifying methods for individual parameters. Default is "Sobol".
    thr_rows : bool, optional
        Whether to add threshold rows to the DataFrame. Default is True.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the parameter ranges and sampling methods.
    """
    # Generate the parameter names, target nodes, and source nodes
    param_names, target_nodes, source_nodes = gen_param_names(topo_df)
    # Create a DataFrame to store the parameter ranges
    prange_df = pd.DataFrame({"Parameter": param_names})
    # Set the default minimum and maximum values for the parameters
    prange_df.loc[
        prange_df["Parameter"].str.contains("Prod_"), ["Minimum", "Maximum"]
    ] = [1.0, 100.0]
    prange_df.loc[
        prange_df["Parameter"].str.contains("Deg_"), ["Minimum", "Maximum"]
    ] = [0.1, 1.0]
    prange_df.loc[
        prange_df["Parameter"].str.contains("ActFld_"), ["Minimum", "Maximum"]
    ] = [1.0, 100.0]
    prange_df.loc[
        prange_df["Parameter"].str.contains("InhFld_"), ["Minimum", "Maximum"]
    ] = [0.01, 1.0]
    prange_df.loc[
        prange_df["Parameter"].str.contains("Hill"), ["Minimum", "Maximum"]
    ] = [1.0, 6.0]
    # Set the sampling method for each parameter, if the sampling method is a dictionary set the specific method for the specified parameters
    if isinstance(sampling_method, str):
        prange_df["Sampling"] = sampling_method
        if sampling_method in ["Normal", "LogNormal"]:
            prange_df["StdDev"] = 1.0
    else:
        for param, method in sampling_method.items():
            prange_df.loc[prange_df["Parameter"].str.contains(param), "Sampling"] = (
                method
            )
        # If the sampling method is not specified for a parameter, set it to "Sobol"
        prange_df["Sampling"] = prange_df["Sampling"].fillna("Sobol")
        if any(prange_df["Sampling"].isin(["Normal", "LogNormal"])):
            prange_df["StdDev"] = 1.0
    # Fill the threshold rows of the parameter range DataFrame
    if thr_rows:
        prange_df = add_thr_rows(prange_df, topo_df, num_params)
    return prange_df

grins.gen_params.gen_param_df(prange_df=None, num_params=2 ** 10, topo_df=None, sampling_method='Sobol', thr_rows=True)

Generate the final parameter DataFrame by sampling parameters. Parameters are grouped by their 'Sampling' (and 'StdDev' if present) to ensure that parameters in the same group follow the same distribution in the higher dimensions. The final DataFrame columns are arranged in the same order as in prange_df. The sampling methods can be: 'Sobol', 'LHS', 'Uniform', 'LogUniform', 'Normal', 'LogNormal'.

Parameters:

  • prange_df (DataFrame, default: None ) –

    DataFrame with columns ["Parameter", "Minimum", "Maximum", "Sampling", ...].

  • num_params (int, default: 2 ** 10 ) –

    Number of samples to generate per parameter. Default is 1024.

  • topo_df (DataFrame, default: None ) –

    DataFrame containing the network topology information.

  • sampling_method (Union[str, dict], default: 'Sobol' ) –

    The sampling method to use. Can be a string specifying a single method for all parameters or a dictionary specifying methods for individual parameters. Default is "Sobol". The methods can be: 'Sobol', 'LHS', 'Uniform', 'LogUniform', 'Normal', 'LogNormal'.

  • thr_rows (bool, default: True ) –

    Whether to add threshold rows to the DataFrame. Default is True.

Returns:

  • DataFrame

    DataFrame of sampled and scaled parameters.

Source code in grins/gen_params.py
def gen_param_df(
    prange_df: pd.DataFrame = None,
    num_params: int = 2**10,
    topo_df: pd.DataFrame = None,
    sampling_method: Union[str, dict] = "Sobol",
    thr_rows: bool = True,
) -> pd.DataFrame:
    """
    Generate the final parameter DataFrame by sampling parameters.
    Parameters are grouped by their 'Sampling' (and 'StdDev' if present) to ensure
    that parameters in the same group follow the same distribution in the higher dimensions.
    The final DataFrame columns are arranged in the same order as in prange_df.
    The sampling methods can be: 'Sobol', 'LHS', 'Uniform', 'LogUniform', 'Normal', 'LogNormal'.

    Parameters
    ----------
    prange_df : pd.DataFrame, optional
        DataFrame with columns ["Parameter", "Minimum", "Maximum", "Sampling", ...].
    num_params : int, optional
        Number of samples to generate per parameter. Default is 1024.
    topo_df : pd.DataFrame, optional
        DataFrame containing the network topology information.
    sampling_method : Union[str, dict], optional
        The sampling method to use. Can be a string specifying a single method for all parameters or a dictionary specifying methods for individual parameters. Default is "Sobol". The methods can be: 'Sobol', 'LHS', 'Uniform', 'LogUniform', 'Normal', 'LogNormal'.
    thr_rows : bool, optional
        Whether to add threshold rows to the DataFrame. Default is True.

    Returns
    -------
    pd.DataFrame
        DataFrame of sampled and scaled parameters.
    """
    # If the parameter range dataframe is not given
    if prange_df is None:
        # Generating the parameter range dataframe from the topology of the network
        prange_df = gen_param_range_df(
            topo_df=topo_df,
            num_params=num_params,
            sampling_method=sampling_method,
            thr_rows=thr_rows,
        )
    # # Get the ordered parameter names
    # ordered_params = prange_df["Parameter"].tolist()
    # # Dictionary to store sampled values for each parameter
    # sampled_dict = {}

    # # Group by 'Sampling' and 'StdDev' columns
    # grouping_cols = ["Sampling"]
    # if "StdDev" in prange_df.columns:
    #     grouping_cols.append("StdDev")

    # # Iterate over the groups and sample the parameters
    # for _, group in prange_df.groupby(grouping_cols, sort=False):
    #     method = group["Sampling"].iloc[0]
    #     std_val = group["StdDev"].iloc[0] if "StdDev" in group.columns else None
    #     dims = group.shape[0]
    #     samples = sample_distribution(method, dims, num_paras, std_dev=std_val)
    #     group_sorted = group.sort_index().reset_index(drop=True)
    #     for i, row in group_sorted.iterrows():
    #         param_name = row["Parameter"]
    #         min_val = row["Minimum"]
    #         max_val = row["Maximum"]
    #         col_samples = samples[:, i]
    #         if "Hill" in param_name:
    #             scaled = np.ceil(max_val * col_samples)
    #         elif "InhFld" in param_name:
    #             inv_min = 1 / max_val
    #             inv_max = 1 / min_val
    #             scaled = 1 / (inv_min + (inv_max - inv_min) * col_samples)
    #         else:
    #             scaled = min_val + (max_val - min_val) * col_samples
    #         sampled_dict[param_name] = scaled
    # # Create a DataFrame from the sampled values
    # data = {param: sampled_dict[param] for param in ordered_params}
    # # Return the DataFrame with the original parameter order
    # return pd.DataFrame(data, columns=ordered_params)
    # Use the sample_param_df function to sample the parameters
    param_df = sample_param_df(prange_df, num_params)
    # Add the ParamNum column to the DataFrame
    # param_df["ParamNum"] = param_df.index + 1
    param_df = param_df.assign(ParamNum=param_df.index + 1)
    return param_df

grins.gen_params.gen_init_cond(topo_df, num_init_conds=2 ** 10)

Generate initial conditions for each node based on the topology.

Parameters:

  • topo_df (DataFrame) –

    DataFrame containing the topology information.

  • num_init_conds (int, default: 2 ** 10 ) –

    Number of initial conditions to generate. Default is 2**10.

Returns:

  • DataFrame

    DataFrame containing the generated initial conditions for each node.

Source code in grins/gen_params.py
def gen_init_cond(topo_df: pd.DataFrame, num_init_conds: int = 2**10) -> pd.DataFrame:
    """
    Generate initial conditions for each node based on the topology.

    Parameters
    ----------
    topo_df : pd.DataFrame
        DataFrame containing the topology information.
    num_init_conds : int, optional
        Number of initial conditions to generate. Default is 2**10.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the generated initial conditions for each node.
    """
    _, target_nodes, source_nodes = gen_param_names(topo_df)
    unique_nodes = sorted(set(source_nodes + target_nodes))
    init_conds = _gen_sobol_seq(len(unique_nodes), num_init_conds)
    # Scale initial conditions between 1 and 100
    init_conds = 1 + init_conds * (100 - 1)
    initcond_df = pd.DataFrame(init_conds, columns=unique_nodes)
    # A new columns for the intial condition numbers
    # initcond_df["InitCondNum"] = initcond_df.index + 1
    initcond_df = initcond_df.assign(InitCondNum=initcond_df.index + 1)
    return initcond_df

grins.racipe_run.gen_sim_dirstruct(topo_file, save_dir='.', num_replicates=3)

Generate directory structure for simulation run.

Parameters:

  • topo_file (str) –

    Path to the topo file.

  • save_dir (str, default: '.' ) –

    Directory to save the generated structure. Defaults to ".".

  • num_replicates (int, default: 3 ) –

    Number of replicates to generate. Defaults to 3.

Returns:

  • None

    Directory structure is created with the topo file name and three folders for the replicates.

Source code in grins/racipe_run.py
def gen_sim_dirstruct(
    topo_file: str, save_dir: str = ".", num_replicates: int = 3
) -> None:
    """
    Generate directory structure for simulation run.

    Parameters
    -----------
    topo_file : str
        Path to the topo file.
    save_dir : str, optional
        Directory to save the generated structure. Defaults to ".".
    num_replicates : int, optional
        Number of replicates to generate. Defaults to 3.

    Returns
    --------
    None
        Directory structure is created with the topo file name and three folders for the replicates.
    """
    # Get the topo file name
    topo_name = topo_file.split("/")[-1].split(".")[0]
    # Check if the folder with the name of topo file exists
    os.makedirs(f"{save_dir}/{topo_name}", exist_ok=True)
    # Move the topo file to the created folder
    subprocess.run(
        [
            "cp",
            topo_file,
            f"{save_dir}/{topo_name}/{topo_file.split('/')[-1]}",
        ]
    )
    # Make the replicate directories
    for rep in range(1, num_replicates + 1):
        os.makedirs(f"{save_dir}/{topo_name}/{rep:03}", exist_ok=True)
    return None

grins.racipe_run.gen_topo_param_files(topo_file, save_dir='.', num_replicates=3, num_params=2 ** 10, num_init_conds=2 ** 7, sampling_method='Sobol')

Generate parameter files for simulation.

Parameters:

  • topo_file (str) –

    The path to the topo file.

  • save_dir (str, default: '.' ) –

    The directory where the parameter files will be saved. Defaults to ".".

  • num_params (int, default: 2 ** 10 ) –

    The number of parameter files to generate. Defaults to 2**10.

  • num_init_conds (int, default: 2 ** 7 ) –

    The number of initial condition files to generate. Defaults to 2**7.

  • sampling_method (Union[str, dict], default: 'Sobol' ) –

    The method to use for sampling the parameter space. Defaults to 'Sobol'. For a finer control over the parameter generation look at the documentation of the gen_param_range_df function and gen_param_df function.

Returns:

  • None

    The parameter files and initial conditions are generated and saved in the specified replicate directories.

Source code in grins/racipe_run.py
def gen_topo_param_files(
    topo_file: str,
    save_dir: str = ".",
    num_replicates: int = 3,
    num_params: int = 2**10,
    num_init_conds: int = 2**7,
    sampling_method: Union[str, dict] = "Sobol",
):
    """
    Generate parameter files for simulation.

    Parameters
    ----------
    topo_file : str
        The path to the topo file.
    save_dir : str, optional
        The directory where the parameter files will be saved. Defaults to ".".
    num_params : int, optional
        The number of parameter files to generate. Defaults to 2**10.
    num_init_conds : int, optional
        The number of initial condition files to generate. Defaults to 2**7.
    sampling_method : Union[str, dict], optional
        The method to use for sampling the parameter space. Defaults to 'Sobol'. For a finer control over the parameter generation look at the documentation of the gen_param_range_df function and gen_param_df function.

    Returns
    -------
    None
        The parameter files and initial conditions are generated and saved in the specified replicate directories.
    """
    # Get the name of the topo file
    topo_name = topo_file.split("/")[-1].split(".")[0]
    # Parse the topo file
    topo_df = parse_topos(topo_file)
    # # Generate the parameter names
    # param_names = gen_param_names(topo_df)
    # Get the unique nodes in the topo file
    # unique_nodes = sorted(set(param_names[1] + param_names[2]))
    # Generate the required directory structure
    gen_sim_dirstruct(topo_file, save_dir, num_replicates)
    # Specify directory where all the generated ode system file will be saved
    sim_dir = f"{save_dir}/{topo_file.split('/')[-1].split('.')[0]}"
    # Generate the ODE system for diffrax
    gen_diffrax_odesys(topo_df, topo_name, sim_dir)
    # Generate the parameter dataframe and save in each of the replicate folders
    for rep in range(1, num_replicates + 1):
        # Generate the parameter range dataframe
        param_range_df = gen_param_range_df(
            topo_df, num_params, sampling_method=sampling_method
        )
        # Save the parameter range dataframe
        param_range_df.to_csv(
            f"{sim_dir}/{rep:03}/{topo_name}_param_range_{rep:03}.csv",
            index=False,
            sep="\t",
        )
        # # Generate the parameter dataframe with the default values
        param_df = gen_param_df(param_range_df, num_params)
        # print(param_df)
        param_df.to_parquet(
            f"{sim_dir}/{rep:03}/{topo_name}_params_{rep:03}.parquet", index=False
        )
        # Generate the initial conditions dataframe
        initcond_df = gen_init_cond(topo_df=topo_df, num_init_conds=num_init_conds)
        # print(initcond_df)
        initcond_df.to_parquet(
            f"{sim_dir}/{rep:03}/{topo_name}_init_conds_{rep:03}.parquet",
            index=False,
        )
    print(f"Parameter and Intial Condition files generated for {topo_name}")
    return None

grins.racipe_run.load_odeterm(topo_name, simdir)

Loads an ODE system from a specified topology module and returns an ODETerm object.

Parameters:

  • topo_name (str) –

    The name of the topology module to import.

  • simdir (str) –

    The directory path where the topology module is located.

Returns:

  • ODETerm

    An object representing the ODE system.

Raises:

  • ImportError

    If the specified module cannot be imported.

  • AttributeError

    If the module does not contain an attribute named 'odesys'.

Source code in grins/racipe_run.py
def load_odeterm(topo_name, simdir):
    """
    Loads an ODE system from a specified topology module and returns an ODETerm object.

    Parameters
    ----------
    topo_name : str
        The name of the topology module to import.
    simdir : str
        The directory path where the topology module is located.

    Returns
    -------
    ODETerm
        An object representing the ODE system.

    Raises
    ------
    ImportError
        If the specified module cannot be imported.
    AttributeError
        If the module does not contain an attribute named 'odesys'.
    """
    sys.path.append(f"{simdir}")
    mod = import_module(f"{topo_name}")
    term = ODETerm(getattr(mod, "odesys"))
    return term

grins.racipe_run.topo_simulate(topo_file, replicate_dir, initial_conditions, parameters, t0=0.0, tmax=200.0, dt0=0.01, tsteps=None, rel_tol=1e-05, abs_tol=1e-06, max_steps=2048, batch_size=10000, ode_term_dir=None)

Simulates the ODE system defined by the topology file and saves the results in the replicate directory. The ode system is loaded as a diffrax ode term and the initial conditions and parameters are passed as jax arrays. The simulation is run for the specified time range and time steps and the results are saved in parquet format in the replicate directory.

Parameters:

  • topo_file (str) –

    Path to the topology file.

  • replicate_dir (str) –

    Directory where the replicate results will be saved.

  • initial_conditions (DataFrame) –

    DataFrame containing the initial conditions.

  • parameters (DataFrame) –

    DataFrame containing the parameters.

  • t0 (float, default: 0.0 ) –

    Initial time for the simulation. Default is 0.0.

  • tmax (float, default: 200.0 ) –

    Maximum time for the simulation. Default is 100.0.

  • dt0 (float, default: 0.01 ) –

    Initial time step size. Default is 0.1.

  • tsteps (list, default: None ) –

    List of time steps at which to save the results. Default is None.

  • rel_tol (float, default: 1e-05 ) –

    Relative tolerance for the ODE solver. Default is 1e-5.

  • abs_tol (float, default: 1e-06 ) –

    Absolute tolerance for the ODE solver. Default is 1e-6.

  • max_steps (int, default: 2048 ) –

    Maximum number of steps for the ODE solver. Default is 2048.

  • batch_size (int, default: 10000 ) –

    Batch size for processing combinations of initial conditions and parameters. Default is 10000.

  • ode_term_dir (str, default: None ) –

    Directory where the ODE system file is located. Default is None. If None, the parent directory of the replicate directory is assumed to contain the ODE system file. The ODE system file should be named as the topo file with the .py extension.

Returns:

pd.DataFrame DataFrame containing the solutions of the ODE system.

Source code in grins/racipe_run.py
def topo_simulate(
    topo_file,
    replicate_dir,
    initial_conditions,
    parameters,
    t0=0.0,
    tmax=200.0,
    dt0=0.01,
    tsteps=None,
    rel_tol=1e-5,
    abs_tol=1e-6,
    max_steps=2048,
    batch_size=10000,
    ode_term_dir=None,
):
    """
    Simulates the ODE system defined by the topology file and saves the results in the replicate directory. The ode system is loaded as a diffrax ode term and the initial conditions and parameters are passed as jax arrays. The simulation is run for the specified time range and time steps and the results are saved in parquet format in the replicate directory.

    Parameters
    ----------
    topo_file : str
        Path to the topology file.
    replicate_dir : str
        Directory where the replicate results will be saved.
    initial_conditions : pd.DataFrame
        DataFrame containing the initial conditions.
    parameters : pd.DataFrame
        DataFrame containing the parameters.
    t0 : float, optional
        Initial time for the simulation. Default is 0.0.
    tmax : float, optional
        Maximum time for the simulation. Default is 100.0.
    dt0 : float, optional
        Initial time step size. Default is 0.1.
    tsteps : list, optional
        List of time steps at which to save the results. Default is None.
    rel_tol : float, optional
        Relative tolerance for the ODE solver. Default is 1e-5.
    abs_tol : float, optional
        Absolute tolerance for the ODE solver. Default is 1e-6.
    max_steps : int, optional
        Maximum number of steps for the ODE solver. Default is 2048.
    batch_size : int, optional
        Batch size for processing combinations of initial conditions and parameters. Default is 10000.
    ode_term_dir : str, optional
        Directory where the ODE system file is located. Default is None. If None, the parent directory of the replicate directory is assumed to contain the ODE system file. The ODE system file should be named as the topo file with the .py extension.


    Returns:
    -------
    pd.DataFrame
        DataFrame containing the solutions of the ODE system.
    """
    # Get the name of the topo file
    topo_name = topo_file.split("/")[-1].split(".")[0]
    # Making sure to remove the trailing slash from the replicate directory path if it exists
    replicate_dir = replicate_dir.rstrip("/")
    # Check if the ode term directory is None
    if ode_term_dir is None:
        # Getting the parent directory of the replicate directory to get the ODE system file
        simul_dir = os.path.dirname(replicate_dir)
        print(f"Loading ODE system from: {simul_dir}")
    else:
        # Making sure to remove the trailing slash from the ode term directory path if it exists
        simul_dir = ode_term_dir.rstrip("/")
    # Load the ODE system as a diffrax ode term
    ode_term = load_odeterm(topo_name, simul_dir)
    # Getting the intial conditions dataframe column names
    ic_columns = initial_conditions.columns
    # Converting the initial conditions and parameters to jax arrays
    initial_conditions = jnp.array(initial_conditions.to_numpy())
    parameters = jnp.array(parameters.to_numpy())
    # Get the combinations of initial conditions and parameters
    icprm_comb = _gen_combinations(len(initial_conditions), len(parameters))
    print(f"Number of combinations to simulate: {len(icprm_comb)}")
    # Processing the time steps
    if tsteps is None:
        saveat = SaveAt(t1=True)
        print(
            f"Running steady state simulations for replicate: {replicate_dir.split('/')[-1]}"
        )
    else:
        # Checking if the time steps are in the correct format
        tsteps = sorted(tsteps)
        # If the final step more than tmax, make tmax equal to the final step
        if tsteps[-1] != tmax:
            tmax = tsteps[-1]
        # Convert the time steps to a jax array
        tsteps = jnp.array(tsteps)
        # Make the saveat object be the time steps
        saveat = SaveAt(ts=tsteps)
        print(
            f"Running time series simulations for replicate: {replicate_dir.split('/')[-1]}"
        )
    # Specifying the PID controller for the step sizes
    stepsize_controller = PIDController(rtol=rel_tol, atol=abs_tol)
    # Get the functions to solve the ODE system
    solveode_fn = parameterise_solveode(
        ode_term,
        Tsit5(),
        t0,
        tmax,
        dt0,
        saveat,
        stepsize_controller,
        max_steps,
        initial_conditions,
        parameters,
    )
    # Jit compile the solveode function
    solveode_fn = jit(solveode_fn)
    # # # Solve for one combination of initial conditions and parameters
    # sol = solveode_fn(icprm_comb[0])
    # print(sol)
    # Create an empty array to store the solutions
    if saveat.subs.ts is None:
        solution_matrix = np.zeros(
            (len(icprm_comb), initial_conditions.shape[1] + 3), dtype=np.float32
        )
    else:
        solution_matrix = np.zeros(
            (len(icprm_comb) * len(saveat.subs.ts), initial_conditions.shape[1] + 2),
            dtype=np.float32,
        )
    # Defining the length of the time steps to properly index the solution matrix
    len_tsteps = len(saveat.subs.ts) if saveat.subs.ts is not None else 1
    # Iterate over the combinations array in batches
    for ip in range(0, len(icprm_comb), batch_size):
        # print(ip)
        # Get the chunk of the combinations array
        icprm_chunk = icprm_comb[ip : ip + batch_size]
        # vmap the solveode function over the chunk of the combinations array
        sols = vmap(solveode_fn)(icprm_chunk)
        # Vertically stack the solutions
        sols = jnp.vstack(sols)
        # Round the solutions to 4 decimal places
        sols = jnp.round(sols, 4)
        # print(ip * len(saveat.subs.ts), (ip * len(saveat.subs.ts)) + len(sols))
        # Add the solutions to the solution matrix at the correct index
        solution_matrix[ip * len_tsteps : (ip * len_tsteps) + len(sols)] = np.array(
            sols
        )
    # Convert the solution matrix to a dataframe
    # REmoving the InitCondNum column from the initial conditions
    if saveat.subs.ts is None:
        solution_matrix = pd.DataFrame(
            solution_matrix,
            columns=ic_columns[:-1].tolist()
            + ["Time", "SteadyStateFlag", "InitCondNum", "ParamNum"],
        )
        # Find number of steady state solutions
        # print(solution_matrix["SteadyStateFlag"].value_counts())
        # # Make the steady state flag, init cond and param num columns into integers
        # solution_matrix[["SteadyStateFlag", "InitCondNum", "ParamNum"]] = (
        #     solution_matrix[["SteadyStateFlag", "InitCondNum", "ParamNum"]].astype(int)
        # )
        # # Save the solution matrix as a parquet file
        # solution_matrix.to_parquet(
        #     f"{replicate_dir}/{topo_name}_steadystate_solutions.parquet", index=False
        # )
    else:
        solution_matrix = pd.DataFrame(
            solution_matrix,
            columns=ic_columns[:-1].tolist() + ["Time", "InitCondNum", "ParamNum"],
        )
        # # Make the init cond and param num columns into integers
        # solution_matrix[["InitCondNum", "ParamNum"]] = solution_matrix[
        #     ["InitCondNum", "ParamNum"]
        # ].astype(int)
        # Find number of last time points
        # print(solution_matrix["Time"].value_counts())
        # Save the solution matrix as a parquet file
        # solution_matrix.to_parquet(
        #     f"{replicate_dir}/{topo_name}_timeseries_solutions.parquet", index=False
        # )
    return solution_matrix

grins.racipe_run.gk_normalise_solutions(sol_df, param_df, threshold=1.01, discretize=False)

Normalises the solutions in the solution dataframe using the maximum production rate (G) and degradation rate (k) parameters of the individual nodes in the parameter sets.

Parameters:

  • sol_df (DataFrame) –

    DataFrame containing the solutions with a ParamNum column to join with param_df.

  • param_df (DataFrame) –

    DataFrame containing the parameters with Prod_ and Deg_ columns for each node.

  • threshold (float, default: 1.01 ) –

    A threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any g/k normalised values exceed this value. Default is 1.01.

  • discretize (bool, default: False ) –

    Whether to discretise the solutions. Default is True.

Returns:

  • DataFrame

    DataFrame containing the normalised and discretised solutions.

  • DataFrame

    DataFrame containing the counts of each state in the normalised solutions.

Example

First, run the simulation then, normalise and discretise the solutions

>>> sol_df, state_counts = gk_normalise_solutions(sol_df, params)

Here, the sol_df is the solution dataframe and params is the parameter dataframe. The state_counts dataframe contains the counts of each state in the normalised solutions. The returned sol_df is the normalised and discretised solution dataframe.

Source code in grins/racipe_run.py
def gk_normalise_solutions(sol_df, param_df, threshold=1.01, discretize=False):
    """
    Normalises the solutions in the solution dataframe using the maximum production rate (G) and degradation rate (k) parameters of the individual nodes in the parameter sets.

    Parameters
    ----------
    sol_df : pd.DataFrame
        DataFrame containing the solutions with a `ParamNum` column to join with param_df.
    param_df : pd.DataFrame
        DataFrame containing the parameters with `Prod_` and `Deg_` columns for each node.
    threshold : float, optional
        A threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any  g/k normalised values exceed this value. Default is 1.01.
    discretize : bool, optional
        Whether to discretise the solutions. Default is True.

    Returns
    -------
    pd.DataFrame
        DataFrame containing the normalised and discretised solutions.
    pd.DataFrame
        DataFrame containing the counts of each state in the normalised solutions.

    Example
    -------
    First, run the simulation then, normalise and discretise the solutions

        >>> sol_df, state_counts = gk_normalise_solutions(sol_df, params)

    Here, the `sol_df` is the solution dataframe and `params` is the parameter dataframe.
    The `state_counts` dataframe contains the counts of each state in the normalised solutions. The returned `sol_df` is the normalised and discretised solution dataframe.
    """
    # Get the node columns from the solution dataframe
    node_cols = [col.replace("Prod_", "") for col in param_df.columns if "Prod_" in col]
    # Get the production and degradation columns
    prod_cols = [f"Prod_{node}" for node in node_cols]
    deg_cols = [f"Deg_{node}" for node in node_cols]
    # Get the gk columns
    gk_cols = [f"gk_{node}" for node in node_cols]
    # Compute the gk values
    param_df[gk_cols] = param_df[prod_cols].values / param_df[deg_cols].values
    # Join the solution dataframe with the parameter dataframe on the 'ParamNum' column
    norm_df = sol_df.merge(param_df[gk_cols + ["ParamNum"]], on="ParamNum")
    # Divide the node columns by the gk columns
    norm_df[gk_cols] = norm_df[node_cols].values / norm_df[gk_cols].values
    if discretize:
        # Select the node columns and discretise the solutions
        norm_df = pd.concat(
            [norm_df, discretise_solutions(norm_df[gk_cols], threshold)], axis=1
        )
        # Get the State columns and return the state counts
        state_counts = norm_df["State"].value_counts()
        # Make the State column as a column
        state_counts = state_counts.reset_index()
        # Rename the columns
        state_counts.columns = ["State", "Count"]
        # Return the normalised and discretised solution dataframe
        return norm_df, state_counts
    else:
        return norm_df

grins.racipe_run.discretise_solutions(norm_df, threshold=1.01)

Discretises the solutions in a g/k normalized DataFrame based on histogram peaks and minima.

Parameters:

  • norm_df (DataFrame) –

    DataFrame containing normalized values to be discretized. It should include only the g/k normalized columns, as the presence of other columns may lead to spurious results.

  • threshold (float, default: 1.01 ) –

    A threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any g/k normalised values exceed this value. If the parameter sets are in such a way that the maximum possible expression of the node is not production/degradation, then the threshold value needs to be adjusted accordingly. Default is 1.01.

Returns:

  • Series

    A Series with the name "State" containing the discrete state labels for each row in the input DataFrame. The order of the labels in the state string is the same as the one input column order.

Raises:

  • UserWarning

    If any value in the DataFrame exceeds the specified threshold value.

Example

Given a normalized DataFrame norm_df, discretise the values

>>> lvl_df = discretise_solutions(norm_df)

The normalized solution DataFrame contains values of the nodes between 0 (lowest) and 1 (highest). The returned lvl_df will have discrete state labels for each row in the input DataFrame.

Raises a warning if any node values exceed the threshold value. This can occur when a node starts with a value higher than its g/k ratio and the simulation is stopped before reaching steady state, even though the value is approaching the correct limit.

In time-series simulations with discretization, similar warnings may occur if initial conditions or intermediate values temporarily exceed the g/k threshold. Additionally, it is important to ensure that the time points and solver tolerance settings are appropriately configured, as improper settings can lead to NaN values in the time series.

For steady-state simulations, increasing the solver's relative and absolute tolerances can improve convergence and reduce such warnings by allowing the simulation to more accurately reach the true steady state.

Source code in grins/racipe_run.py
def discretise_solutions(norm_df, threshold=1.01):
    """
    Discretises the solutions in a g/k normalized DataFrame based on histogram peaks and minima.

    Parameters
    ----------
    norm_df : pd.DataFrame
        DataFrame containing normalized values to be discretized. It should include only the g/k normalized columns, as the presence of other columns may lead to spurious results.
    threshold : float, optional
        A threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any  g/k normalised values exceed this value. If the parameter sets are in such a way that the maximum possible expression of the node is not production/degradation, then the threshold value needs to be adjusted accordingly. Default is 1.01.

    Returns
    -------
    pd.Series
        A Series with the name "State" containing the discrete state labels for each row in the input DataFrame. The order of the labels in the state string is the same as the one input column order.

    Raises
    ------
    UserWarning
        If any value in the DataFrame exceeds the specified threshold value.

    Example
    -------
    Given a normalized DataFrame ```norm_df```, discretise the values

        >>> lvl_df = discretise_solutions(norm_df)

    The normalized solution DataFrame contains values of the nodes between 0 (lowest) and 1 (highest).
    The returned `lvl_df` will have discrete state labels for each row in the input DataFrame.

    Raises a warning if any node values exceed the threshold value. This can occur when a node starts with a value higher than its g/k ratio and the simulation is stopped before reaching steady state, even though the value is approaching the correct limit.

    In time-series simulations with discretization, similar warnings may occur if initial conditions or intermediate values temporarily exceed the g/k threshold. Additionally, it is important to ensure that the time points and solver tolerance settings are appropriately configured, as improper settings can lead to NaN values in the time series.

    For steady-state simulations, increasing the solver's relative and absolute tolerances can improve convergence and reduce such warnings by allowing the simulation to more accurately reach the true steady state.

    """
    # Flatten the numeric part of the dataframe (columns 4 onwards)
    flat = norm_df.values.flatten()

    # Add dummy values to ensure boundary peaks are detected
    dummy_low = np.full(10, -0.1)
    dummy_high = np.full(10, 1.1)
    data_with_dummy = np.concatenate([dummy_low, flat, dummy_high])

    # Compute histogram over 120 bins
    flat_hist, bin_edges = np.histogram(data_with_dummy, bins=120)

    # Define threshold for peak and minima detection (1% of data length)
    peak_detection_threshold = int(len(flat) * 0.01)

    # Detect peaks in the histogram
    peaks, _ = find_peaks(
        flat_hist,
        height=peak_detection_threshold,
        threshold=peak_detection_threshold,
        prominence=peak_detection_threshold,
        distance=int(len(bin_edges) * 0.1),
    )
    maxima_bins = bin_edges[peaks]

    # Detect minima by inverting the histogram (skip first and last bin)
    minima_indices, _ = find_peaks(
        np.max(flat_hist) - flat_hist[1:-1], height=np.max(flat_hist) * 0.99
    )
    # Adjust indices (because we skipped the first bin)
    minima_indices = [idx + 1 for idx in minima_indices]

    # Initialize minima bins with the boundaries 0.0 and 1.0
    minima_bins = [0.0, 1.0]

    # For multiple peaks, find a representative (median) minimum between each adjacent pair
    if len(maxima_bins) > 1:
        for i in range(1, len(maxima_bins)):
            # Candidate minima between two successive peaks
            candidates = [
                bin_edges[idx]
                for idx in minima_indices
                if maxima_bins[i - 1] <= bin_edges[idx] <= maxima_bins[i]
            ]
            if candidates:
                median_candidate = candidates[len(candidates) // 2]
                minima_bins.append(median_candidate)
    # If only one peak exists, take the median minimum if available
    elif minima_indices:
        median_candidate = bin_edges[minima_indices[len(minima_indices) // 2]]
        minima_bins.append(median_candidate)

    # Ensure the bin edges are sorted
    minima_bins = np.sort(minima_bins)

    # Clip the values to 1.0 as due to small numerical errors, some values may be slightly above 1.0
    norm_df = norm_df.mask((norm_df > 1) & (norm_df < threshold), 1.0)

    # If any value is found to be higher than the hard threshold, raise an error
    if (norm_df > threshold).any().any():
        warnings.warn(
            f"Some g/k values exceed the hard threshold of {threshold}, check your input data. Potential reason if running with default values would be solver accuracy - lower rel_tol and abs_tol values.",
            category=UserWarning,
        )

    # Use vectorized binning to assign each value to a discrete level
    # The number of levels is len(minima_bins)-1 (each interval defines one level)
    lvl_df = norm_df.apply(
        lambda col: pd.cut(
            col,
            bins=minima_bins,
            labels=range(len(minima_bins) - 1),
            include_lowest=True,
        )
    )
    lvl_df = lvl_df.add_prefix("Lvl_")

    # Create a 'State' column by concatenating the discrete levels as strings
    lvl_df["State"] = "'" + lvl_df.astype(str).apply("".join, axis=1) + "'"

    return lvl_df["State"]

grins.racipe_run.run_all_replicates(topo_file, save_dir='.', t0=0.0, tmax=200.0, dt0=0.01, tsteps=None, rel_tol=1e-05, abs_tol=1e-06, max_steps=2048, batch_size=10000, normalize=True, discretize=True, gk_threshold=1.01)

Run simulations for all replicates of the specified topo file. The initial conditions and parameters are loaded from the replicate folders. The directory structure is assumed to be the same as that generated by the gen_topo_param_files function, with the main directory with the topo file name which has the parameter range file the ODE system file and the replicate folders with the initial conditions and parameters dataframes.

Parameters:

  • topo_file (str) –

    Path to the topology file.

  • save_dir (str, default: '.' ) –

    Directory where the replicate folders are saved. Defaults to "." i.e current working directory.

  • t0 (float, default: 0.0 ) –

    Initial time for the simulation. Defaults to 0.0.

  • tmax (float, default: 200.0 ) –

    Maximum time for the simulation. Defaults to 100.0.

  • dt0 (float, default: 0.01 ) –

    Initial time step for the simulation. Defaults to 0.1.

  • tsteps (int, default: None ) –

    Number of time steps for the simulation. Defaults to None.

  • rel_tol (float, default: 1e-05 ) –

    Relative tolerance for the simulation. Defaults to 1e-5.

  • abs_tol (float, default: 1e-06 ) –

    Absolute tolerance for the simulation. Defaults to 1e-6.

  • max_steps (int, default: 2048 ) –

    Maximum number of steps for the simulation. Defaults to 2048.

  • batch_size (int, default: 10000 ) –

    Batch size for the simulation. Defaults to 1000.

  • normalize (bool, default: True ) –

    Whether to g/k normalise the solutions. Defaults to True.

  • discretize (bool, default: True ) –

    Whether to discretize the solutions. Defaults to True.

  • gk_threshold (float, default: 1.01 ) –

    A hard threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any g/k normalised values exceed this value. Default is 1.01.

Returns:

  • None

    The results of the simulation are saved in the replicate folders in the specified directory.

Note

The results of the simulation are saved in the replicate folders in the specified directory. If the simulation is time series, the results are saved as timeseries_solutions.parquet and if the simulation is steady state, the results are saved as steadystate_solutions.parquet.

Normalization and discretization of the solutions are optional features, but note that discretization requires normalization to be enabled.

Behavior based on the discretize and normalize flags:

  • If discretize=True, the normalized solutions are discretized, and state counts are saved to {topo_name}_steadystate_state_counts_{replicate_base}.csv. This applies only to steady-state simulations.

  • If discretize=False, the solutions are normalized but not discretized.

Effect on the final solution DataFrame:

  • If only discretize=True, a State column is added. The order of the levels in the state string will be the same as the order of the node columns.
  • If only normalize=True, additional columns are added for each node containing the g/k-normalized values. The column names corresponding to the g/k normalised values will have the format "gk_{node_name}".
  • If both flags are set to True, both the state column and the normalized value columns are included.
Example

Run the simulation for the specified topo file

>>> run_all_replicates(topo_file, save_dir, t0, tmax, dt0, tsteps, rel_tol, abs_tol, max_steps, batch_size)
Source code in grins/racipe_run.py
def run_all_replicates(
    topo_file,
    save_dir=".",
    t0=0.0,
    tmax=200.0,
    dt0=0.01,
    tsteps=None,
    rel_tol=1e-5,
    abs_tol=1e-6,
    max_steps=2048,
    batch_size=10000,
    normalize=True,
    discretize=True,
    gk_threshold=1.01,
):
    """
    Run simulations for all replicates of the specified topo file. The initial conditions and parameters are loaded from the replicate folders. The directory structure is assumed to be the same as that generated by the gen_topo_param_files function, with the main directory with the topo file name which has the parameter range file the ODE system file and the replicate folders with the initial conditions and parameters dataframes.

    Parameters
    ----------
    topo_file : str
        Path to the topology file.
    save_dir : str, optional
        Directory where the replicate folders are saved. Defaults to "." i.e current working directory.
    t0 : float, optional
        Initial time for the simulation. Defaults to 0.0.
    tmax : float, optional
        Maximum time for the simulation. Defaults to 100.0.
    dt0 : float, optional
        Initial time step for the simulation. Defaults to 0.1.
    tsteps : int, optional
        Number of time steps for the simulation. Defaults to None.
    rel_tol : float, optional
        Relative tolerance for the simulation. Defaults to 1e-5.
    abs_tol : float, optional
        Absolute tolerance for the simulation. Defaults to 1e-6.
    max_steps : int, optional
        Maximum number of steps for the simulation. Defaults to 2048.
    batch_size : int, optional
        Batch size for the simulation. Defaults to 1000.
    normalize : bool, optional
        Whether to g/k normalise the solutions. Defaults to True.
    discretize : bool, optional
        Whether to discretize the solutions. Defaults to True.
    gk_threshold : float, optional
        A hard threshold value below which the g/k normalised values, if found, will be cliped to 1. Raises an error during discretization if any  g/k normalised values exceed this value. Default is 1.01.

    Returns
    -------
    None
        The results of the simulation are saved in the replicate folders in the specified directory.

    Note
    ----
    The results of the simulation are saved in the replicate folders in the specified directory. If the simulation is time series, the results are saved as `timeseries_solutions.parquet` and if the simulation is steady state, the results are saved as `steadystate_solutions.parquet`.

    Normalization and discretization of the solutions are optional features, but note that discretization requires normalization to be enabled.

    Behavior based on the `discretize` and `normalize` flags:

    - If `discretize=True`, the normalized solutions are discretized, and state counts are saved to `{topo_name}_steadystate_state_counts_{replicate_base}.csv`. This applies only to steady-state simulations.

    - If `discretize=False`, the solutions are normalized but not discretized.

    Effect on the final solution DataFrame:

    - If only `discretize=True`, a `State` column is added. The order of the levels in the state string will be the same as the order of the node columns.
    - If only `normalize=True`, additional columns are added for each node containing the g/k-normalized values. The column names corresponding to the g/k normalised values will have the format "gk_{node_name}".
    - If both flags are set to `True`, both the `state` column and the normalized value columns are included.

    Example
    --------
    Run the simulation for the specified topo file

        >>> run_all_replicates(topo_file, save_dir, t0, tmax, dt0, tsteps, rel_tol, abs_tol, max_steps, batch_size)
    """
    # # Cheking if discretize is True and normalise is False, if so turn normalise to True
    # if discretize and not normalize:
    #     normalize = True
    # Get the name of the topo file
    topo_name = topo_file.split("/")[-1].split(".")[0]
    # Get the list of replicate folders
    replicate_folders = sorted(
        [
            folder
            for folder in glob(f"{save_dir}/{topo_name}/*/")
            if os.path.basename(folder.rstrip("/")).isdigit()
        ],
    )
    # Loop through the replicate folders and run the simulation for each replicate
    for replicate_dir in replicate_folders:
        # Getting the base name of the replicate directory
        replicate_base = os.path.basename(replicate_dir.rstrip("/"))
        # Load the initial conditions and parameters dataframes
        init_cond_path = (
            f"{replicate_dir}/{topo_name}_init_conds_{replicate_base}.parquet"
        )
        params_path = f"{replicate_dir}/{topo_name}_params_{replicate_base}.parquet"
        # Read the initial conditions and parameters dataframes
        init_conds = pd.read_parquet(init_cond_path)
        params = pd.read_parquet(params_path)
        # Starting the timer
        start_time = time.time()
        # Run the simulation for the specified topo file and given initial conditions and parameters
        sol_df = topo_simulate(
            topo_file=topo_file,
            replicate_dir=replicate_dir,
            initial_conditions=init_conds,
            parameters=params,
            t0=t0,
            tmax=tmax,
            dt0=dt0,
            tsteps=tsteps,
            rel_tol=rel_tol,
            abs_tol=abs_tol,
            max_steps=max_steps,
            batch_size=batch_size,
        )
        # Ending the timer
        print(f"Time taken for replicate {replicate_base}: {time.time() - start_time}")
        if discretize:
            print("Normalising and Discretising the solutions")
            sol_df, state_counts = gk_normalise_solutions(
                sol_df,
                params,
                discretize=discretize,
                threshold=gk_threshold,
            )
            # Remove g/k columns if normalise is false
            if not normalize:
                sol_df = sol_df.drop(
                    columns=[col for col in sol_df.columns if col.startswith("gk_")]
                )
        elif normalize:
            print("Normalising the solutions")
            # G/k normalise and/or discretize the solution dataframe
            sol_df = gk_normalise_solutions(
                sol_df, params, discretize=discretize, threshold=gk_threshold
            )
        else:
            pass
        # Check if the time seires is given or not to name the solution file
        if tsteps is None:
            # Save the solution dataframe
            sol_df.to_parquet(
                f"{replicate_dir}/{topo_name}_steadystate_solutions_{replicate_base}.parquet"
            )
            if discretize:
                state_counts.to_csv(
                    f"{replicate_dir}/{topo_name}_steadystate_state_counts_{replicate_base}.csv",
                    index=False,
                    sep="\t",
                )
        else:
            # Read the solution dataframe
            sol_df.to_parquet(
                f"{replicate_dir}/{topo_name}_timeseries_solutions_{replicate_base}.parquet"
            )
        print(f"Simulation completed for replicate: {replicate_base}\n")
        # # break  ##################################
    return None