OpenRefine 2024 BarCamp

Meeting retrospective

added by Tom Morris on 18.06.2024

Review what worked well and what could be improved for attendees (Berlin + rest of world), audience, stakeholders, etc.

Scheduling note - this would fit best either at the end of the con or afterwards

0 Votes

Using OpenRefine with Wikidata, Wikibase and Wikimedia Commons

added by Owen Stephens on 17.06.2024

A session to look at how users of OpenRefine with Wikibase, Wikidata, and Wikimedia commons use the openRefine, what are the key use cases, key functions and key workflows.

Explore what needs improving and where the wiki* use cases overlap and diverge from other types of OpenRefine usage

11 Votes

I'd be interested in attending this session

I'm in

I'd be interested in attending this session

Interested.

I would like to attend this ssession (remotely)

Interested in synergy between OpenRefine & Wikidata

Approaches to training people to use OpenRefine

added by Owen Stephens on 17.06.2024

I regularly deliver OpenRefine training, specifically to librarians, and I also wrote the original Library Carpentry OpenRefine course and I continue as a maintainer for that course.

I'm interested in both sharing my experience and sharing the experience of others as to good approaches to offering training on a tool which has so many possible functions and use cases

11 Votes

Important, I would like to hear and share my experience as well

I'm happy to facilitate this session

I am interested

I'd be interested in attending this session

I would like to attend this session (remotely)

I'd be interested in attending this session

Hi Owen, I am Franziska, working for the Natural History Museum in Berlin, we are looking for a Person who can give us some training for our collection managers to clean and sort theior collection data. Would it be possible to make a session with you?

Presentation of the project status

added by Martin Magdinier on 17.06.2024

Presenting OpenRefine grant status and what is the funding and team structure plan for next 12 to 18 months

8 Votes

Reconciliation in OpenRefine

added by Ayushi on 17.06.2024

This session demonstrates the workflow of the reconciliation service in OpenRefine along with the recent improvements (before vs after) in it be it in the the UI or in the error display, this would also be a brief anecdote of my experience in working for OpenRefine both as an Outreachy intern and a contractor along with various improvements that can be done in future for even better reconciliation !

14 Votes

I'd be interested in attending this session

I would like to join discussing user experiences / modalities related to reconciliation. I hope it's related!

Interested attending

Interested.

interested

Interested

I'm unlikely to make a 4am session (even if our city hadn't just won the world championship), but fortunately my input is easy to provide in asynchronous textual format. 1. "Reconciliation" is just a fancy name for search 2. Wikidata is, by far, the most generally useful data source to reconcile against and extend from for the vast majority of OpenRefine users 3. Wikidata would benefit greatly from a broad ecosystem of contributors using OpenRefine who can find duplications, gaps, etc in Wikidata 4. The Wikidata Search team should support a production quality OpenRefine reconciliation service The OpenRefine community should deploy its political capital in support of achieving #4 with the recognition that after almost a decade of resistance by Wikidata, it won't be an easy task.

Well, I guess it would have been a little easier if formatting was preserved... :(

How to leverage OpenRefine for a variety of user experiences?

added by Susanna Ånäs on 10.06.2024

OpenRefine has developed robust capabilities for key operations for data import, wrangling and ingestion for several platforms. Different user segments have needs that the technical, complex and unintuitive interfaces cannot meet.

Coming from the Wikimedia / GLAM partnerships domain, I would love to see how different modalities could be used for reconciliation: geographic comparison, comparison of source materials, using visual clues, using exact matches via authority IDs, AI suggestions, use of related information etc.

It is obvious that OR development cannot meet all the possible needs of all the user segments, but what kind of architecture would allow building on top of OpenRefine, or connect with capabilities of OpenRefine to create tools for specific purposes while taking advantage of all current capabilities? What kind of production structure would be needed, who should coordinate and fund that?

Even if this overlaps with other proposals, the clue of this proposal is to focus on facilitating a variety of user experiences.

3 Votes

Deleting this proposal in favor of several others: - Reconciliation - Wikimedia - Demos + more

OpenRefine as a service

added by Benjamin Rosemann on 07.06.2024

When you browse GitHub or GitLab you can start a browser based integrated development environment with just one click. You can then start hacking with only a web browser and afterwards commit your results back to the project.

What needs to be done to do the same with OpenRefine in the context of datasets?

As a user you browse a collection of datasets and with one click you have a dataset loaded as OpenRefine project in your browser. With another click the updated data is sent back to the data provider.

As a "data provider" you can start an OpenRefine instance for just a single user/session and populate it with a given dataset. After working on this dataset there is some callback to update the dataset.

6 Votes

I am interested

Interested

Interested.

Interested; Can do note-taking

Support OpenAPI in OpenRefine

added by Benjamin Rosemann on 07.06.2024

More and more services document their REST endpoints using the OpenAPI specification (https://swagger.io/specification/). Using the OpenAPI specification it is quite simple to programmatically create REST requests for the documented API.

Supporting OpenAPI would allow more people to use this "technical" services, without the need for programming customized GUIs, export and/or import functionalities.

Use-Cases:

- Create OpenRefine projects from REST endpoints.

- Create REST requests from columns similar to the "Add column by fetching URLs" feature in OpenRefine.

- Create or update items on a REST endpoint via an export dialog.

Challenges:

- Setting custom headers (User-Agents, ...)

- Consider API rate limits

- Support paging

- Support API authentication

- Map columns to request properties/payloads

- ...

There is some overlap with the session proposal "Supporting data upload to more platforms". The goal of this session would be to identify and define more use cases and challenges regarding the support of this specific technology. Maybe even get technical and start some first experiments.

12 Votes

I'm interested in the possibility of using OpenRefine with the open source library software FOLIO (https://folio.org) which has OpenAPI definitions for many of its APIs. My feeling is that supporting a range of authentication options out of the box might be the biggest challenge for this type of use right now, but having some built in functionality to know how to do requests and use the response would be amazing

An important point to remember is that OpenRefine doesn't support a public API (yet). So the first step would be to provide a supported public API.

I'd be interested in attending this session

I am interested

I can do note-taking

interested

joining

I'd be interested in attending this session

interested

Interested.

Interested in APIs in general

OpenRefine Vision, Mission, Values Workshop

added by Sheila Moussavi on 06.06.2024

Bocoup has interviewed several members of the OpenRefine community to draft initial versions of OpenRefine’s Mission, Vision, and Values.

In this session, we will share updates on the engagement thus far, reflect on the importance of a coherent Vision that guides Mission & Values, and workshop these drafts in small groups.

10 Votes

I'd be interested in attending this session

Interested.

Open source design workflows in OpenRefine's development ecosystem

added by Lozana Rossenova on 16.05.2024

Who can contribute to OpenRefine's design framework? What workflows would enable not only broader participation, but also more actionable and achievable design contributions? Design practice in open source products and communities is notoriously difficult to implement in equitable and effective ways, so with this session we can discuss together how this could work in the context of OpenRefine: What can we learn from design projects carried out with OpenRefine in the past, and what could a future design contribution framework for OpenRefine look like.

5 Votes

What is the OpenRefine community is expecting from designer? Overall UX when integrating OpenRefine in a workflow, or screen design? How do we get people with a design background familiar enough with the tool

I am interested

I'm interested

Making OpenRefine more useful as an exploratory tool

added by Julie Faure-Lacroix on 02.05.2024

It'd be great to explore options to improve basic data visualization in OpenRefine. Using visualization tools as part of facets is great IMO but the visual side could be improved, especially if we're better integrating GIS tools to OpenRefine. We could discuss the scope we're comfortable with when it comes to exploratory analyses and suggest types of visualizations/analyses we feel are lacking right now.

10 Votes

I've thought about this a lot over the years. OpenRefine is strongly suited on the "Refine" part. Being "Open" means its open source, but also always open for new ideas, integrations, etc. Apache Superset https://superset.apache.org/ is one very cool tool that could integrate with OpenRefine's "data engine" to provide filtered data to Superset. Users would run Superset separately alongside OpenRefine. Interestingly, there are many such "live" integrations running along with OpenRefine that could be done with other tools besides Superset! I think we'd just need to improve things in OpenRefine to expose project grid data for these kinds of integrations to happen with other tools more easily. For example, for Superset, we'd need OpenRefine to expose project data as a database dialect, this might be easiest if we have an alternative for storage to an embedded database such as H2, Derby, HSQLDB, SQLite, Ignite, etc. Putting visualizations and analysis into OpenRefine directly is not impossible, but the development and maintenance of them is staggeringly huge. But we certainly need better support for analysis in several areas. Types, for instance, has long been available in GREL, but we don't have a standard Facet by Type GREL: value.type(), which was because the assumption was a single column would likely have the same type in all cells. Messy data does not conform to that rule (or any rules!), so we definitely can do better, even perhaps helping with extension support for AI and more semantic type discovery. Cannot wait to hear the problems folks have in better Faceting and Analysis. Visualizations however might need a 1or 2 month barcamp! :-) So for visualizations we might roll this discussion into "If only OpenRefine could.." ?

I take back what I said. We SHOULD have a session on just "Exploratory Analysis" and just hear everyone's ideas. So this session could be less "Refine"-ing or Cleaning and more so "Explore"-ing. I like it.

This is a long standing issue which might be interesting to revisit in the context of this proposed session https://github.com/OpenRefine/OpenRefine/issues/2001

Hmmm, maybe something like OR in Jupyter Notebook? Looks like someone thought about it: https://gist.github.com/psychemedia/d67e7de29a2d012183681778662ef4b6

I'd be interested in attending this session

Interested.

Bridging OpenRefine and GIS

added by Julie Faure-Lacroix on 02.05.2024

It would be great to have improved tools to read/write common GIS formats like shapefiles. Perhaps having a closer collaboration with OSM? ArcGIS is becoming fairly open to open-source as well and they basically dominate the academic world and multiple industries. Some governments are turning to QGIS, which might be the fastest way forward for now, but it'd be interesting to hear from anyone who has specific challenges with GIS and OpenRefine.

Adding geospatial data to non-geospatial datasets can also represent a bit of a challenge for beginners when there are so many good geospatial sources out there.

8 Votes

Spatial data that are stored in HDF5 format can be used in GIS and imaging programs including QGIS, ArcGIS, and ENVI. An indirect effort that I am trying to help push along is supporting HDF5 import and export. But there needed to be a Java HDF5 writer, so I helped sponsor this nice chap after getting thumbs up from HDF5 Group themselves that it would be a great idea https://github.com/jamesmudd/jhdf/issues/354 During import, an HDF5 directory structure viewer would be a need I guess. One cool feature is that it supports data slicing, extracting portions of a dataset as needed so that the whole file or dataset doesn't need to be read into memory. https://www.neonscience.org/resources/learning-hub/tutorials/about-hdf5 I've had discussions with Justin Meyers also but would love to learn more about GIS challenges. He pulls data down from around the world https://github.com/justinelliotmeyers?tab=repositories I really think OpenRefine can help in a few areas!

Adding geographic data is a 'basic task' in many projects/courses I teach, would be interesting to understand more about this subject

Interested.

Improving OpenRefine contributor pathways: Roles, Permissions, and Processes

added by Martin Magdinier on 26.04.2024

Our ongoing discussions throughout 2023 and early 2024 (https://forum.openrefine.org/t/improving-the-onboarding-process-for-new-contributors/882) have highlighted the need for a more systematic approach to managing our contributors. This session will serve as a platform to share our ideas and discuss how we can formalize the process. We will looking into the following points:

1. Defining Contributor Roles: What are the various contributor roles we should recognize and encourage within our community? How can we ensure that each role is clear and meaningful?

2. Setting Permission Levels: What should the permission levels be for different stages of involvement (e.g., new contributor vs. long-term committer)? How can these levels help in managing community contributions more effectively?

3. Managing Permissions: What processes should we implement for granting and revoking permissions? What criteria should be used to ensure fairness and transparency in these decisions?

11 Votes

What does an Extension Developer role look like as it is generally outside of main OpenRefine development? Extension developers may want or need closer ties to the other contributors. We also need an overhaul for the extension tutorial.

Existing thread https://forum.openrefine.org/t/requesting-feedback-documenting-openrefine-community-handbooks/1224/ and https://forum.openrefine.org/t/improving-the-onboarding-process-for-new-contributors/882/5

This session will be more on how we engage with the different pathways and not the content of a specific contribution type (documentation, designer, extension developer)

Interested.

If only OpenRefine could be more like…

added by Antonin Delpeuch on 19.04.2024

Quick demos of other tools that OpenRefine should take inspiration from

12 Votes

For documentation, we need a much needed overhaul to the extension documentation. What will this look like for a new version of OpenRefine?

I'm now wondering if the scope of this session isn't too broad. Perhaps it's better to have more focused sessions where people will naturally come up with examples of what other projects do better.

We can do a quick 5-minute demo of different tools we can learn from.

I'd be interested in attending this session

AS a Big Ideas session, I'm interested.

Joining

I am interested

I can facilitate

interested

Joining

interested

I'd be interested in attending this session

Will be joining :)

Supporting data upload to more platforms

added by Antonin Delpeuch on 19.04.2024

OpenRefine currently supports data upload to Wikibase, but also to RDF databases via the RDF Transform extension or to SNAC also via an extension for instance. More such extensions (or forks) have been developed in the past, with varied maintenance status. A recent discussion with the GND community explored the idea of building a similar integration to populate the authority file of the German national library.

How can we enable more of such integrations? Are there specific integrations we want to prioritize? Should those be developed inside the OpenRefine project or outside, as extensions?

The goal of this session is to explore those questions together: mapping the ecosystem of existing and potential integrations, reflecting on what their common needs are and how OpenRefine can meet them better.

8 Votes

I've explored an idea for executing SPARQL queries and updates directly from OpenRefine to pull and push data. How can we make SPARQL queries easier in OpenRefine via expressions? Can we use a SPARQL query as a source, i.e., import data from a triple store using SPARQL? Can we export the data directly to a triple store via a SPARQL update?

Strategizing our Roadmap for user needs

added by Thad Guidry on 04.04.2024

It would be nice to have someone go over the general roadmap of OpenRefine at the start. A final session on last day might be where we summarize the consensus and put into GitHub Project the Roadmap plan (perhaps with milestones). From my personal viewpoint, it seems we just need to get consensus on priorities which seem shifty to our users. We indeed already have some forum posts on Roadmap discussions, but for many users, it feels disconnected from "what is possible" versus "what do users need" versus the actually realized "what can we get funded to actually work on despite our users needs".

(a sub-component of this discussion is how best to present dependencies of Roadmap items to users? Many other projects in GitHub use milestone partitioning, instead of Tracks and Tracked by fields. For example: "4.0-first", "4.0-second" "4.0-extensions-first", etc. in order to know that the "first" issues need to be worked first because "second" issues have dependency on the "first".)

(another sub-component of this discussion is adding new fields to the GitHub Project tables to enhance Roadmap visibility)

7 Votes

I am very interested in what the 4.0 extension process / connector will look like. This needs solid documentation and examples. A comprehensive catalog of the internal hooks is needed to help guide extension developers on their availability and use. It feels like their are hidden hooks that require a deep dive into the main OpenRefine code to find them. Library dependency issues occur with conflicts between extention libs and main OpenRefine libs. Can an extension override an OpenRefine lib when certain "newer" functionality is needed by the extension? Should this be sandboxed in someway? Also, Java version guidance is / was an issue. Apparently, some end users are confused about the need to install updated Java versions to run OpenRefine.

From today's presentation I suggest working at a high level status with goal post we want to achieve as a community. This way we can identify volunteers or grants available to make them happen

I am interested

I can do note-taking

Session Proposals

Meeting retrospective

Using OpenRefine with Wikidata, Wikibase and Wikimedia Commons

Approaches to training people to use OpenRefine

Presentation of the project status

Reconciliation in OpenRefine

How to leverage OpenRefine for a variety of user experiences?

OpenRefine as a service

Support OpenAPI in OpenRefine

OpenRefine Vision, Mission, Values Workshop

Open source design workflows in OpenRefine's development ecosystem

Making OpenRefine more useful as an exploratory tool

Bridging OpenRefine and GIS

Improving OpenRefine contributor pathways: Roles, Permissions, and Processes

If only OpenRefine could be more like…

Supporting data upload to more platforms

Strategizing our Roadmap for user needs