The increasing data requirement of Internet applications has driven a dramatic surge in developing new programming paradigms and complex scheduling algorithms to handle data-intensive workloads. Due to the expanding volume and the variety of such flows, their raw data are often processed on intermediate processing nodes before being sent to servers. The intermediate processing constraints are however not yet considered in existing task and flow computing models. In this paper, we aim to minimize the total tardiness of all flows in the presence of intermediate processing constraints. We build a model to consider Tardiness-aware Flow Scheduling with Processing constraints (TFS-P), which is unfortunately NP-Hard. Hence, we propose a heuristic Routing and Scheduling duplex MATching (RSMAT) framework based on the classic Gale-Shapley Matching Theory. We find that the problem can be well-addressed by classic Deferred Acceptance (DA) algorithm, in which the match is stable but inefficient for the model. We therefore propose the Tardiness-aware Deferred Acceptance algorithm with Dynamical Quota (TDA-DQ). This algorithm is enhanced by overcoming the inefficient stability and smartly considering the dynamical quota in the system. The evaluation compares TDA-DQ to the lower bound obtained by a modified subgradient optimization algorithm. The result indicates that TDA-DQ can achieve near-optimal performance for data-intensive applications.