A method and device for drug delivery is provided, in particular though not exclusively, for the administration of anaesthetic to a patient. A state associated with a patient is determined based on a value of at least one parameter associated with a condition of the patient. The state corresponds to a point in a state space comprising possible states and the state space is continuous. A reward function is provided for calculating a reward. The reward function comprises a function of state and action, wherein an action is associated with an amount of substance to be administered to a patient. The action corresponds to a point in an action space comprising all possible actions wherein the action space is continuous. A policy function is provided which defines an action to be taken as a function of state and the policy function is adjusted using reinforcement learning to maximize an expected accumulated reward.