Session Proposals

How to leverage OpenRefine for a variety of user experiences?

OpenRefine has developed robust capabilities for key operations for data import, wrangling and ingestion for several platforms. Different user segments have needs that the technical, complex and unintuitive interfaces cannot meet.

Coming from the Wikimedia / GLAM partnerships domain, I would love to see how different modalities could be used for reconciliation: geographic comparison, comparison of source materials, using visual clues, using exact matches via authority IDs, AI suggestions, use of related information etc.

It is obvious that OR development cannot meet all the possible needs of all the user segments, but what kind of architecture would allow building on top of OpenRefine, or connect with capabilities of OpenRefine to create tools for specific purposes while taking advantage of all current capabilities? What kind of production structure would be needed, who should coordinate and fund that?

Even if this overlaps with other proposals, the clue of this proposal is to focus on facilitating a variety of user experiences.

Benjamin Rosemann

OpenRefine as a service

When you browse GitHub or GitLab you can start a browser based integrated development environment with just one click. You can then start hacking with only a web browser and afterwards commit your results back to the project.

What needs to be done to do the same with OpenRefine in the context of datasets?

As a user you browse a collection of datasets and with one click you have a dataset loaded as OpenRefine project in your browser. With another click the updated data is sent back to the data provider.

As a "data provider" you can start an OpenRefine instance for just a single user/session and populate it with a given dataset. After working on this dataset there is some callback to update the dataset.

Benjamin Rosemann

Support OpenAPI in OpenRefine

More and more services document their REST endpoints using the OpenAPI specification (https://swagger.io/specification/). Using the OpenAPI specification it is quite simple to programmatically create REST requests for the documented API.

Supporting OpenAPI would allow more people to use this "technical" services, without the need for programming customized GUIs, export and/or import functionalities.

Use-Cases:

- Create OpenRefine projects from REST endpoints.

- Create REST requests from columns similar to the "Add column by fetching URLs" feature in OpenRefine.

- Create or update items on a REST endpoint via an export dialog.

Challenges:

- Setting custom headers (User-Agents, ...)

- Consider API rate limits

- Support paging

- Support API authentication

- Map columns to request properties/payloads

- ...

There is some overlap with the session proposal "Supporting data upload to more platforms". The goal of this session would be to identify and define more use cases and challenges regarding the support of this specific technology. Maybe even get technical and start some first experiments.

I'm interested in the possibility of using OpenRefine with the open source library software FOLIO (https://folio.org) which has OpenAPI definitions for many of its APIs. My feeling is that supporting a range of authentication options out of the box might be the biggest challenge for this type of use right now, but having some built in functionality to know how to do requests and use the response would be amazing
Owen Stephens, 07.06.2024
Sheila Moussavi

OpenRefine Vision, Mission, Values Workshop

Bocoup has interviewed several members of the OpenRefine community to draft initial versions of OpenRefine’s Mission, Vision, and Values.

In this session, we will share updates on the engagement thus far, reflect on the importance of a coherent Vision that guides Mission & Values, and workshop these drafts in small groups.

Open source design workflows in OpenRefine's development ecosystem

Who can contribute to OpenRefine's design framework? What workflows would enable not only broader participation, but also more actionable and achievable design contributions? Design practice in open source products and communities is notoriously difficult to implement in equitable and effective ways, so with this session we can discuss together how this could work in the context of OpenRefine: What can we learn from design projects carried out with OpenRefine in the past, and what could a future design contribution framework for OpenRefine look like.

Making OpenRefine more useful as an exploratory tool

It'd be great to explore options to improve basic data visualization in OpenRefine. Using visualization tools as part of facets is great IMO but the visual side could be improved, especially if we're better integrating GIS tools to OpenRefine. We could discuss the scope we're comfortable with when it comes to exploratory analyses and suggest types of visualizations/analyses we feel are lacking right now.

I've thought about this a lot over the years. OpenRefine is strongly suited on the "Refine" part. Being "Open" means its open source, but also always open for new ideas, integrations, etc. Apache Superset https://superset.apache.org/ is one very cool tool that could integrate with OpenRefine's "data engine" to provide filtered data to Superset. Users would run Superset separately alongside OpenRefine. Interestingly, there are many such "live" integrations running along with OpenRefine that could be done with other tools besides Superset! I think we'd just need to improve things in OpenRefine to expose project grid data for these kinds of integrations to happen with other tools more easily. For example, for Superset, we'd need OpenRefine to expose project data as a database dialect, this might be easiest if we have an alternative for storage to an embedded database such as H2, Derby, HSQLDB, SQLite, Ignite, etc. Putting visualizations and analysis into OpenRefine directly is not impossible, but the development and maintenance of them is staggeringly huge. But we certainly need better support for analysis in several areas. Types, for instance, has long been available in GREL, but we don't have a standard Facet by Type GREL: value.type(), which was because the assumption was a single column would likely have the same type in all cells. Messy data does not conform to that rule (or any rules!), so we definitely can do better, even perhaps helping with extension support for AI and more semantic type discovery. Cannot wait to hear the problems folks have in better Faceting and Analysis. Visualizations however might need a 1or 2 month barcamp! :-) So for visualizations we might roll this discussion into "If only OpenRefine could.." ?
Thad Guidry, 02.06.2024
I take back what I said. We SHOULD have a session on just "Exploratory Analysis" and just hear everyone's ideas. So this session could be less "Refine"-ing or Cleaning and more so "Explore"-ing. I like it.
Thad Guidry, 02.06.2024
This is a long standing issue which might be interesting to revisit in the context of this proposed session https://github.com/OpenRefine/OpenRefine/issues/2001
Owen Stephens, 06.06.2024
Hmmm, maybe something like OR in Jupyter Notebook? Looks like someone thought about it: https://gist.github.com/psychemedia/d67e7de29a2d012183681778662ef4b6
Keven L. Ates, 10.06.2024

Bridging OpenRefine and GIS

It would be great to have improved tools to read/write common GIS formats like shapefiles. Perhaps having a closer collaboration with OSM? ArcGIS is becoming fairly open to open-source as well and they basically dominate the academic world and multiple industries. Some governments are turning to QGIS, which might be the fastest way forward for now, but it'd be interesting to hear from anyone who has specific challenges with GIS and OpenRefine.

Adding geospatial data to non-geospatial datasets can also represent a bit of a challenge for beginners when there are so many good geospatial sources out there.

Spatial data that are stored in HDF5 format can be used in GIS and imaging programs including QGIS, ArcGIS, and ENVI. An indirect effort that I am trying to help push along is supporting HDF5 import and export. But there needed to be a Java HDF5 writer, so I helped sponsor this nice chap after getting thumbs up from HDF5 Group themselves that it would be a great idea https://github.com/jamesmudd/jhdf/issues/354 During import, an HDF5 directory structure viewer would be a need I guess. One cool feature is that it supports data slicing, extracting portions of a dataset as needed so that the whole file or dataset doesn't need to be read into memory. https://www.neonscience.org/resources/learning-hub/tutorials/about-hdf5 I've had discussions with Justin Meyers also but would love to learn more about GIS challenges. He pulls data down from around the world https://github.com/justinelliotmeyers?tab=repositories I really think OpenRefine can help in a few areas!
Thad Guidry, 02.06.2024

Improving OpenRefine contributor pathways: Roles, Permissions, and Processes

Our ongoing discussions throughout 2023 and early 2024 (https://forum.openrefine.org/t/improving-the-onboarding-process-for-new-contributors/882) have highlighted the need for a more systematic approach to managing our contributors. This session will serve as a platform to share our ideas and discuss how we can formalize the process. We will looking into the following points:

1. Defining Contributor Roles: What are the various contributor roles we should recognize and encourage within our community? How can we ensure that each role is clear and meaningful?

2. Setting Permission Levels: What should the permission levels be for different stages of involvement (e.g., new contributor vs. long-term committer)? How can these levels help in managing community contributions more effectively?

3. Managing Permissions: What processes should we implement for granting and revoking permissions? What criteria should be used to ensure fairness and transparency in these decisions?

What does an Extension Developer role look like as it is generally outside of main OpenRefine development? Extension developers may want or need closer ties to the other contributors. We also need an overhaul for the extension tutorial.
Keven L. Ates, 01.05.2024
Antonin Delpeuch

If only OpenRefine could be more like…

In this session, all participants are invited to bring up projects that OpenRefine should take inspiration from. This could relate to all sorts of aspects:

- user interface ("tool X feels much less clunky than OpenRefine")

- features ("I always miss this feature from tool Y when I work with OpenRefine")

- project structure ("OpenRefine should be made by a single person / only volunteers / only paid staff / a company like project X")

- documentation ("it's much easier to learn / teach tool Y because…")

- contribution workflows ("I prefer contributing to project X because…")

- any other aspect you can think of!

In a first brainstorming phase, we will gather wishes and group them area. Then for each wish we will invite the participant who expressed it to briefly explain (and possibly demonstrate onscreen, if doable) their wish.

The outcomes of this session will be documented in the minutes, which will be posted on the forum.

For documentation, we need a much needed overhaul to the extension documentation. What will this look like for a new version of OpenRefine?
Keven L. Ates, 01.05.2024
Antonin Delpeuch

Supporting data upload to more platforms

OpenRefine currently supports data upload to Wikibase, but also to RDF databases via the RDF Transform extension or to SNAC also via an extension for instance. More such extensions (or forks) have been developed in the past, with varied maintenance status. A recent discussion with the GND community explored the idea of building a similar integration to populate the authority file of the German national library.

How can we enable more of such integrations? Are there specific integrations we want to prioritize? Should those be developed inside the OpenRefine project or outside, as extensions?

The goal of this session is to explore those questions together: mapping the ecosystem of existing and potential integrations, reflecting on what their common needs are and how OpenRefine can meet them better.

I've explored an idea for executing SPARQL queries and updates directly from OpenRefine to pull and push data. How can we make SPARQL queries easier in OpenRefine via expressions? Can we use a SPARQL query as a source, i.e., import data from a triple store using SPARQL? Can we export the data directly to a triple store via a SPARQL update?
Keven L. Ates, 01.05.2024
Thad Guidry

Strategizing our Roadmap for user needs

It would be nice to have someone go over the general roadmap of OpenRefine at the start. A final session on last day might be where we summarize the consensus and put into GitHub Project the Roadmap plan (perhaps with milestones). From my personal viewpoint, it seems we just need to get consensus on priorities which seem shifty to our users. We indeed already have some forum posts on Roadmap discussions, but for many users, it feels disconnected from "what is possible" versus "what do users need" versus the actually realized "what can we get funded to actually work on despite our users needs".

(a sub-component of this discussion is how best to present dependencies of Roadmap items to users? Many other projects in GitHub use milestone partitioning, instead of Tracks and Tracked by fields. For example: "4.0-first", "4.0-second" "4.0-extensions-first", etc. in order to know that the "first" issues need to be worked first because "second" issues have dependency on the "first".)

(another sub-component of this discussion is adding new fields to the GitHub Project tables to enhance Roadmap visibility)

I am very interested in what the 4.0 extension process / connector will look like. This needs solid documentation and examples. A comprehensive catalog of the internal hooks is needed to help guide extension developers on their availability and use. It feels like their are hidden hooks that require a deep dive into the main OpenRefine code to find them. Library dependency issues occur with conflicts between extention libs and main OpenRefine libs. Can an extension override an OpenRefine lib when certain "newer" functionality is needed by the extension? Should this be sandboxed in someway? Also, Java version guidance is / was an issue. Apparently, some end users are confused about the need to install updated Java versions to run OpenRefine.
Keven L. Ates, 01.05.2024
In order to be able to create or vote for proposals, you need to be logged in. you can log in and register here