NLP model – statement clustering
Project Idea Metadata
- Project Idea Name: NLP model – statement clustering
- Date: 6/19/2022 1:03:19 PM
- Administrators:
Project Idea Description
Initial situation and problem definition
In a current practical project, statements about different spatial planning options are collected. Each statement belongs to one of the four types: pro, contra, question, or risk. A statement can occur several times and belongs to one option each. By means of annotations, the topic context is also recorded. Examples of such statements
Statements Options Type Annotations Die Kosten sind zu hoch 1 Kontra Finanzen Wie hoch sind die Kosten einer solchen Lösung? 2 Frage Finanzen Als Steuerzahler nicht vertretbar 1 Kontra Finanzen Die Hohe Lärmbelastung muss geprüft werden. 3 Risiko Immissionen
This is a specific project of a Swiss municipality, in which various experts are involved. Today the clustering of such statements regarding their type is done manually and this is a time-consuming process.
Aim of the work and expected results
In the concrete use case, about 400 statements have already been manually assigned to one of the four possible types. To optimize the process, an NLP model shall be developed, which automatically suggest the probability of each of those four types for newly entered statements. The resulting suggested associations could be used as support for the manual process.
The evaluation of the best suited method to do so (e.g., fuzzy-clustering, label assignment or others) is part of the work and shall be included in the work.
In addition, a second model can be trained, which is used to statistically infer the underlying spatial planning option from a newly entered statement. (optional)
The evaluations of the currently running surveys are planned for November 2022. Those surveys present an opportunity to get additional input from professionals. The collected feedback shall be considered to further develop and enhance the software performances.
For convenient use of the developed model(s), an optional API can be implemented which allows the computations to happen on a server. A minimal matching interface, maybe Web-based, could be used to simplify the access, transforming it into an initial browser application.
Desired methods, procedure
Through literature research, a deductive procedure has to be used to develop a hypothesis and a prediction about the possibilities of training an NLP-Model for this specific task. Using desk research, data from practice and research shall be collected, compared and described. The results are to be confirmed or refuted by qualitative methods such as prototyping, interviews or group discussions.
The realization has to take place in an incremental procedure, which enables the continuous prioritization of the requirements and to adapt to unforeseen changes. Risks emerging from this approach should be documented and opportunely addressed.
Creativity, variations, innovation
The definition of the most appropriate approaches for the data analysis and the model development accounts to the explorative part of this work and should be documented in the final work. Since the prototype is planned to be accessed as a web-based application through an API, the tools and technologies to develop the prototype can be chosen freely.
In a current practical project, statements about different spatial planning options are collected and clustered manually by experts. The aim of this work is to research possibilities to achieve this clustering with the help of a Natural Language Processing (NLP) model in order to support these experts. Different approaches are to be compared in order to find an optimal solution for the final prototype.